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ABSTRACT 


./ 

Several  aspects  of  group  theory  that  prove  useful  for  various  signal  processing 
applications  are  presented  ^ 

Chapter  I  begins  with  a  discussion  of  signal  processing  activities  and  goals  at  an 
abstract  level,  and  continues  with  a  look  at  the  mathematical  underpinnings  of  this 
subject.  There  follows  a  list  of  specific  mathematicil  results  that  seem  to  be  of 
greatest  relevance  to  signal  processing.  ) 

Chapter  II  surveys  the  role  played  by  infinite  groups  in  modeling  signals  and  filters. 
Here  substantial  use  is  made  of  the  associated  harmonic  analysis,  in  the  abelian  case 
the  dual  group  serves  as  the  natural  frequency  domain. 

Chapter  III  presents  a  fairly  detailed  review  of  the  representation  theory  of  finite 
groups,  through  the  Plancherel  formula.  The  essential  idea  here  is  to  then  use  those 
special  unitary  transforms  which  are  also  group  transforms  for  digital  signal 
compression  and  decorrelation,  and  the  associated  group  filters  as  fast  suboptimal 
Wiener  (or  other)  filters.  Initial  evidence  suggests  that  nonabelian  group  filters  can 
improve  on  the  standard  DFT/FFT  methods  without  significant  increase  in 
computational  complexity. 

Chapters  I  and  III  are  written  at  an  elementary  level  for  wide  access;  Chapter  11  is 
written  at  a  higher  level,  requiring  some  background  in  functional  and  harmonic 
analysis.  Comments  are  inserted  throughout  to  suggest  various  generalizations  of  the 
material  under  discussion. 

Chapter  IV  contains  a  summary  of  the  main  points  and  conclusions,  and  suggests 
some  directions  for  further  research,  particularly  on  the  use  of  finite  nonabelian 
group  transforms  and  filters. 
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MATHEMATICAL  FOUNDATIONS  OF  SIGNAL  PROCESSING 
11.  THE  ROLE  OF  GROUP  THEORY 

I.  BACKGROUND  TO  SIGNAL  PROCESSING 


In  this  introductory  chapter  we  will  set  down  some  general  principles  and  philosophy  of 
signal  processing,  while  attempting  to  avoid  the  details  of  specific  applications.  The  field  is  by 
now  far  too  vast  and  multifaceted  to  permit  any  simple  summary  or  encapsulation.  Cur  aims  will 
be  modest;  to  agree  on  some  terminology,  some  historical  background,  and  some  of  the  goals  of 
signal  processing.  There  follows  a  brief  resume  of  the  problems  and  methods  of  signal  processing. 
Together  this  material  is  intended  to  furnish  a  compressed  understanding  of  the  field  at  an 
abstract  level. 

An  inevitable  consequence  of  an  author’s  professional  experience  and  personal  predilections 
is  a  particular  and  usually  subjective  answer  to  the  basic  questions:  What  is  interesting?  What  is 
important?  In  the  present  context  this  results  in  a  neutral,  mathematical  approach,  free  of 
implementation  considerations  and  the  requirements  of  specific  technologies.  We  are  searching  for 
mathematical  paradigms  of  some  elegance  and  widespread  applicability.  Tliere  is  some  analogy 
here  with  physical  theories  and  formulas,  where  a  single  principle  can  be  given  a  crisp 
mathematical  formulation  and  then  applied  in  a  variety  of  real  situations.  There  are  also  certain 
more  specific  analogies  between  physics  and  signal  theory,  based  on  a  comn.on  underlying 
mathematical  formulation,  A  familiar  example  is  the  uncertainty  principle  wnich,  in  its 
mathematical  essence,  is  an  instance  of  the  local /global  duality  between  Fourier  transform  pairs 
(see  below.  Section  11.4).  But  such  analogies  should  not  be  pressed  too  far,  because  the  basic 
tools  of  signal  processing  (Fourier  transform,  bandlimited  function,  stationary  process,  etc.)  are 
after  all,  mathematical  and  not  empirical  in  nature.  We  have  a  greater  freedom  in  selecting  sqm:-! 
models  than  physical  models;  utility  and  efficiency  are  the  major  criteria,  rather  than  agreemcrt 
with  experimental  data. 

This  report  represents  a  continuation  of  the  author’s  interest  in  the  foundations  of  signal 
processing.  An  earlier  paper  [1]  provided  a  detailed  operator-theoretic  treatment  of  discrete-time 
single<hannel  signal  processing.  In  the  terminology  of  Section  II. 2  below  such  a  signal  would  be 
described  as  a  weakly  stationary  random  process  on  the  group  of  integers.  Extensions  of  some  of 
the  basic  results  of  [1]  to  modeling  and  filtering  processes  on  more  general  groups  are  given  in 
Sections  1I.2-II.4.  Later,  in  1983,  the  author  gave  a  course  “Fundamentals  of  Signal  Processing” 
at  the  Lincoln  Laboratory,  which  remains  available  on  videotape.  There  the  view  was  taken  that 
a  common  goal  of  signal  processing  activity  is  to  identify  a  model  which  in  some  sense  explains  a 
given  pattern  of  observations.  If  the  model  is  considered  as  an  unknown  element  of  a  suitable 
Hilbert  space,  and  is  assumed  related  to  the  observations  by  a  linear  (and  often  compact) 
transformation,  then  estimates  of  the  model  can  be  constructed  by  operator-theoretic  methods 
again;  for  example,  by  the  use  of  singular  vector  expansions,  pseudoinversion,  and  regularization. 


1 


The  resulting  estimates  often  have  interpretation  as  various  types  of  spline  functions,  and  also  as 
Bayes  estimates  for  an  appropriate  prior.  This  methodology  can  also  be  viewed  as  yielding 
particular  cases  of  general  optimal  algorithms  [2]. 

In  contrast  to  these  purely  Hilbert  space  techniques  we  are  going  to  consider  below  the 
role  played  in  signal  processing  by  another  fundamental  mathematical  structure  —  groups.  In 
Chapter  II  we  discuss  several  signal  models  that  can  be  defined  on  and/or  analyzed  in  terms  of 
an  underlying  group  structure.  This  material  is  presented  as  a  rapid  survey  at  a  fairly  advanced 
level.  In  Chapter  III  we  take  an  opposite  tack,  and  go  into  some  more  elementary  material  in 
greater  detail.  There  the  emphasis  is  on  ‘finite’  —  the  application  of  finite  groups  to  finite 
dimensional  signal  processing.  We  discuss  the  use  of  group  transforms,  especially  those  that  are 
‘fast’,  for  coding  and  patter.i  recognition  purposes,  and  group  filters,  for  signal  estimation.  This 
material  is  of  real  practical  value  and  there  are  ample  opportunities  for  further  research.  Our 
intent  is  primarily  to  expose  the  basic  concepts  and  issues. 

I.i  WHAT  IS  SIGNAL  PROCESSING? 

Let  us  begin  with  a  provisional  definition:  a  signal  is  the  output  of  an  array  of  sensors 
configured  in  time  and/or  space.  Accordingly,  then,  a  signal  represents  observations  made  of 
some  physical  process  and,  we  presume,  it  carries  information  per',  iining  to  the  state  of  some 
physical  system. 

Most  signals  of  engineering  interest  occur  in  the  context  of  remote  sensing.  We  will 
understand  this  term  to  refer,  in  a  generic  way,  to  observations  made  at  a  distance  by  devices 
sensitive  to  some  sort  of  energy.  This  energy  could  be  electromagnetic  in  nature  (gamma  and 
x-rays,  visible  light,  infrared,  radio  and  television,  etc.),  acoustical,  or  vibrational  (mec’.ianical, 
seismic)  Remote  sensing  systems  may  be  classified  as  active  or  passive,  according  as  the  received 
energy  is  that  produced  by  a  man  made  transmitter  and  then  scattered  by  an  object  of  interest, 
or  is  produced  (or  reflected)  by  the  object  alone. 

Passive  systems  naturally  occur  in  the  contexts  of  astronomy,  photography,  satellite 
scanning,  and  geophysical  recording.  Of  course,  in  a  different,  nonengineering,  direction,  we 
might  include  economic  systems.  Active  systems  include  radar/ sonar,  a  variety  of  medical 
imaging  devices  (CAT,  NMR,  PET,  US,  etc.),  industrial  procedures  employing  CAT-like 
equipment  for  quality  testing  purposes,  often  called  nondestructive  evaluation  (NDE),  and  seismic 
prospecting. 

Naturally,  each  of  the  above  areas  is  a  major  field  in  itself,  and  so  there  is  by  now  a  massive 
literature  in  signal  processing  and  its  applicatioi's:  many  conference  proceedings,  including  the 
annual  ICASSP  Proceedings  from  the  IEEE,  several  specialized  journals  and  200  or  so  books. 

The  present  report  is  intended  to  have  a  small  overlap  with  this  literature. 

It  is  possible  to  partition  the  history  of  signal  processing  in  the  20th  century  into  3  eras,  as 
indicated  in  the  following  table  (1-1).  Of  course,  such  brevity  cannot  do  justice  to  all  the 
developments  and  authors  involved  in  the  signal  processing  business;  our  table  is  intended  only  to 
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TABLE  1-1 

Eras  of  Signal  Processing  in  the  20th  Century 

Physical: 

Vacuum  tubes,  lumped  circuits 

1910- 

1940 

Analytical: 

Impulse  response,  transfer  function, 
transform  methods 

Names: 

Fourier/Laplace,  Bode,  Nyquist 

Physical: 

Microwave  circuits 

1940- 

1960 

Analytical: 

Statistical  concepts  (correlation, 
matched  filters,  information  theory) 

Names: 

Gabor,  Shannon,  Wiener 

Physical: 

Oigitial  computer  (permits  realization 
of  arbitrary  transfer  functions), 
integrated  circuits,  optical  technology 

1960- 

Present 

Analytical: 

Digital  filters,  spectrum  estimation, 
fast  Fourier  transform,  linear  inverse  theory 

Names: 

Kailath,  Oppenheim,  Slepian,  Tukey 

be  suggestive  rather  than  complete.  What  we  do  want  to  stress  is  that  current  signal  processing 
activity  draws  on  many  disciplines  from  within  mathematics,  as  well  as  on  computer  science  and 
integrated  circuit  technology.  There  is  in  general  an  ongoing  dynamic  interplay  between 
algorithms  and  architecture.  A  broad  survey  of  the  field  at  present  is  given  in  the  collection  [4], 
edited  by  T.  Kailath;  again,  there  is  little  overlap  with  the  present  work. 

Let  us  now  restate  our  provisional  definition  of  a  signal  as  follows:,  the  sensor  output 
referred  to  at  the  beginning  of  this  section  will  now  be  called  data  (often,  measurements, 
observations,  — ).  The  phrase  ‘signal  processing’  will  henceforth  be  replaced  by  data  processing, 
and  will  be  taken  to  mean  the  purposeful  modification  of  data  in  order  to  eliminate  redundancy 
or  to  extract  information.  Let  us  in  turn  take  this  last  phrase  to  mean  the  construction  (or, 
identification)  of  a  mathematical  model  which,  in  some  sense,  ‘explains’  the  data.  Finally,  we 
make  the  following  definition:  a  signal  is  an  unobservable  mathematical  quantity  related  to  the 
data  whose  value  can  be  inferred  from  an  identified  model. 

What  do  we  want  to  learn  from  data  processing?  A  couple  of  very  abstract  goals  were  stated 
above.  Somewhat  more  specifically  we  list  the  following  possible  goals  as  indicative  of  the  major 
problems  of  the  field: 
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»  Data  compression,  decorrelation  and  feature  extraction  (efficient 

representation  for  display,  storage,  transmission,  pattern  recognition,  etc.) 

•  Recognize  significant  aspects  of  the  data  (trends,  periodicities,  etc.) 

•  Signal  detection 

•  Signal  estimation 

•  Simulation  (use  of  an  identified  model  to  generate  more,  artificial,  data 

which,  in  some  sense,  behaves  like  the  original  data) 

Now,  in  terms  of  a  specific  goal,  one  of  the  most  interesting  ensuing  questions  (subjectively 
speaking,  again)  concerns  the  nature  of  the  actual  operations  that  we  elect  to  perform  on  the 
data.  How  are  these  to  be  justified;  why  do  we  do  one  thing  and  not  another?  Some  of  the 
factors  which  impinge  on  this  aecision  are  the  following: 

•  goal  of  the  processing  (as  above) 

•  model  structure  (physical  or  synthetic) 

•  prior  information  and  constraints 

•  performance  criterion 

•  nature  and  amount  of  data 

•  computational  time  available  (circuit  speed,  necessity  for  real-time 
operation,  etc.) 

In  a  way,  these  factors  collectively  define  an  abstract  paradigm  for  data  processing,  in  that  we 
may  expect  their  specification  to  (eventually)  result  in  the  choice  of  a  particular  numerical 
algorithm. 

We  might  offer  two  further  comments  about  the  design  and  use  of  algorithms.  We  have  just 
indicated  that  many  factors  must  be  specified  before  an  algorithm  can  possibly  emerge.  This  is 
exactly  the  reason  there  is  such  a  diversity  of  data  processing  algorithms  extant,  and  why  so 
often  two  of  them  are  not  directly  comparable.  Thus,  when  trying  to  select  from  among  the 
legion  of  available  algorithms,  one’s  first  task  is  to  be  sure  that  it  was  designed  to  achieve  the 
user’s  goal  and  that  it  is  consistent  with  the  other  factors  just  listed.  There  are  other  general 
desiderata  also,  when  selecting  an  existing  algorithm  or  attempting  to  design  one’s  own.  There 
should  be  a  ‘good  question’  in  the  background,  that  is,  a  well-defined  or,  more  technically,  a 
well-posed  problem  whose  solution  exists  uniquely,  for  given  data,  and  depends  continuously  on 
the  data.  The  (approximate)  solution  should  be  efficiently  computable  and  stable  with  respect  to 
measurement  errors  (‘noise’).  (Even  well-posed  problems  often  result  in  numerically  ill-conditioned 
equations  as,  for  example,  in  the  common  case  of  computation  of  the  pseudoinverse  of  a  linear 
operator  of  finite  rank.  Naturally,  il)  conditioning  will  amplify  any  measurement  noise  present.) 
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These  desiderata,  along  with  that  of  rapid  convergence  as  the  associated  discretization  decreases, 
while  giving  guidance  in  algorithm  design,  also  show  that  considerable  care  and  effort  are 
required.  Thus,  the  first  comment  about  algorithm  design  is  that  it  is  both  hard  to  do  well  and 
hard  to  compare  competing  results. 

The  second  comment  pertains  at  a  more  philosophical  level  to  this  entire  business  of  what 
might  be  termed  the  algorithmic-centered  approach  to  engineering  problems.  In  this  approach,  as 
just  outlined,  one  designs  an  algorithm  based  on  the  physics  and  geometry  of  a  particular 
observational  situation,  and  in  accord  with  the  preceding  general  guidelines,  then  another 
algorithm  is  proposed  and  analyzed,  and  so  forth.  The  existence  of  so  many  algorithms  in  the 
engineering  literature  suggests  that  this  appioach  is  indeed  widespread. 

An  alternative,  gradually  acquiring  some  well  deserved  acceptance,  is  what  might  be  termed 
the  information-centered  approach.  Here  one  basically  indicates  the  type  and  quantity  of 
information  (that  is,  models  plus  data)  available  about  an  unknown  signal,  along  with  a 
performance  criterion.  The  theory  then  reveals  an  optimal  algorithm  which  results  in  the 
minimum  possible  error,  relative  to  the  assigned  performance  criterion.  The  theory  can  also  yield 
bounds  on  problem  complexity  ^basically  the  cost  to  carry  out  the  solution),  and  optimal 
information  (of  a  given  type  and  quantity).  This  latter,  and  newer,  approach  to  dealing  with 
problems  involving  uncertainty  has  been  developed  by  J.  Ttaub  and  co-workers  and  presented  in 
the  monographs  [2,3].  More  re<.:?ntly,  it  is  begining  to  be  discovered  by  engineers  [5]. 

We  conclude  this  section  with  a  few  remarks  about  the  epistemological  aspects  of  signal /data 
processing.  In  this,  as  in  any  other  instance  of  scientific  inquiry,  it  must  be  recognized  that  it  is 
not  possible  to  ‘know  reality’  but,  at  best,  how  reality  interacts  with  some  sort  of  probe.,  That  is, 
to  paraphase  Kant,  our  representation  of  things  does  not  conform  to  these  things  as  they  are  ‘in 
themselves;’  rather,  they  conform  to  our  mode  of  representation.  So  we  can  make  sense  of  the 
world  only  by  imposing  some  structure  originating  from  the  mind  upon  it.  This  is  the  sense  of 
the  dictum  of  Protagoras  that  ‘man  is  the  measure  of  all  things.’  Therefore,  we  must  acknowledge 
that  the  observer’s  presence  is  inevitable  and  ubiquitous  in  the  final  result  because  of  the  plan  of 
observation.  It  follows  that  knowledge  does  not  represent  certainty,  hence  is  .:ot  final,  but  rather 
open  to  improvement;  we  approximate  truth  by  stages.  We  can  never  reach  complete  knowledge 
of  reality,  but  we  can  obtain  encodings  (models)  of  it  in  terms  of  prior  knowledge  of  the  plan  of 
observation.  These  models  should  be,  as  already  suggested,  the  solutions  to  ‘good  questions;‘  each 
of  them  provides  some  insight  into  the  overall  situation,  and  a  set  of  such  models  will  eventually 
permit  decisions  to  be  made.  We  accept  the  unfortunate  fate  that  most  (all?)  important  decisions 
must  be  made  on  the  basis  of  insufficient  information,  and  we  do  as  well  as  possible  by  means  of 
the  scientific  method  in  general  and  data  processing  in  particular. 

Such  philosophical  speculaions  associated  with  scientific  induction  can  be  traced  back  to 
Plato  and  the  question:  “how  can  you  ^;’ek  what  you  do  not  know?’’.  They  were  developed 
further  by  R.  Descartes,  I.  Kant,  D.  Hume,  and  more  recently  by  A.  Eddington,  K.  Popper, 

R.  Von  Mises,  inter  alia.  A  convenient  survey  is  givcii  by  W.  Salmon  in  [6]. 
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1.2  MATHEMATICAL  METHODS  IN  SIGNAL  PROCESSING 


One  way  to  acquire  at  least  a  superficial  understanding  of  a  scientific  field  is  to  organize  it 
conceptionally  into  problems,  methods,  and  results.  We  have  already  listed  the  major  problems  in 
signal  processing  in  a  generic  way,  and  we  will  look  at  a  few  of  these  in  more  detail  in  Chapter  III. 
We  next  want  to  briefiy  survey  the  major  mathematical  fields  useful  in  signal  processing,  and 
then  in  Section  1.3  a  few  examples  of  specific  mathematical  results  that  are  of  particular 
usefulness  for  signal  processing.  Several  further  mathematical  models  and  theorems  occur  in  the 
next  chapter. 

Once  more  we  take  time  out  to  emphasize  the  subjective  nature  of  our  approach,  as  well  as 
the  eclectic  nature  of  our  subject.  Signal  processing  draws  upon  many  disciplines  besides 
mathematics;  for  example,  physics,  systems  and  information  theory,  numerical  analysis  and 
computer  science,  digital  circuit  technology,  etc.  One  need  not  be  expert  in  all  these  fields  (if  it 
were  possible!)  in  order  to  do  signal  processing.  In  particular,  while  a  great  deal  of  mathematics 
can  be  involved  (as  in  physics),  it  is  not  all  necessary  to  work  productively  on  many  problems. 
Thus  each  individual  makes  his  own  decision  about  how  far  to  go  in  certain  directions. 

Having  reiterated  this  position  we  now  survey  some  useful  areas  of  mathematics.  As  noted 
below  in  Section  II.l,  most  practical  data  processing  involves  the  manipulation  of  finite  arrays  of 
real  or  complex  numbers.  Thus  it  is  immediately  apparent  that  the  methods  of  linear  algebra  will 
be  crucial.  Note  only  must  the  basic  concepts  [rank,  (pseudo)  inverse,  eigenvalue/ vector, 
condition,  etc.]  be  mastered,  but  of  equal  importance  is  the  need  for  stable  numerical  algoruhms 
to  compute  these  quantities.  For  example,  a  variant  of  Pisarenko’s  method  of  harmonic  retrieval 
identifies  the  number  of  pure  tones  observed  in  noise  as  the  rank  of  a  certain  hermitian  matrix. 
For  numerical  determination  of  rank,  pseudoinverse,  and  the  solution  of  ill-conditioned  systems 
of  linear  equations,  the  singular  value  decomposition  is  now  the  method  of  choice  [7,  8]. 
Unfortunaiply,  it  is  very  computationally  intensive  and  hence  more  suited  to  off-line  treatment  in 
sophisticatf  J  hardware.  For  real-time  on-board  applications  the  search  of  efficient  high  speed 
leahzationc  of  linear  algebraic  procedures  continues,  both  in  the  areas  of  improved  (‘parallel’) 
algorithms  and  a  specialized  hardware  (array  processors)  utilizing  VLSI  technology. 

Due  to  the  ubiquitous  presence  of  noise  and  impurities  in  observational  devices,  the  data  to 
be  analyzed  is  never  exact;  that  is,  in  the  woids  of  John  Tukey,  “what  is  measured  is  not  the 
truth’’.  Thus,  as  in  many  other  aspects  of  life  in  general,  it  is  necessary  to  look  through  noise 
when  constructing  estimates  of  unobservable  signals.  The  mathematical  methods  for  treating 
errors  in  data  and  the  resulting  estimates  come  from  the  fields  of  probability  and  statistics.  The 
vital  concepts  are  conditional  distributions  and  expectations,  limit  theorems,  the  behavior  of 
empirical  distributions,  probability  density  estimation,  Neyman,  Pearson  theory,  etc.  Very 
commonly,  received  data  is  modeled  as  a  realization  of  a  stochastic  process  over  some  domain. 
The  major  theme  of  this  report  is  to  work  with  this  setup  under  the  further  assumption  that  the 
domain  has  the  structure  of  a  group.  Then  the  process  can  be  viewed  as  a  probabilit>  measure 
over  a  space  of  functions  (sample  paths)  defined  on  the  group.  Such  spaces  often  possess  a  great 
deal  of  rich  mathematical  structure  (a  Hilbert  space,  a  Banach  algebra,  etc.),  and  so  detailed 
(orthogonal)  decompositions  of  the  data  are  available. 
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We  think  it  fair  to  say  that  the  mathematical  mainstream  of  the  first  half  of  the  20th  century 
involved  the  simultaneous  development  of  the  manifestions  of  the  concepts  of  groups  and  Hilbert 
spaces.  These  interlocked  through  the  concept  of  a  group  representation,  that  is  a  continuous 
homomorphism  of  a  given  locally  compact  group  into  the  unitary  group  on  some  Hilbert  space. 
These  developments  were  driven  especially  by  the  (then)  new  ideas  and  requirements  of  quantum 
mechanics,  and  are  associated  with  the  names  of  Hilbert,  Weyl,  von  Neumann,  Stone,  along  with, 
of  course,  many  others.  Other  ear)v  motivations  came  from  studies  in  integral  equations  (Riesz), 
formalization  of  Brownian  motion  A'iener),  models  for  prediction  of  time  series  (Wold, 
Kolmogorov),  and  Fourier  series  expansions  (see  below). 

While  it  is  true  that  all  Hilbert  spaces  of  the  same  orthogonal  dimension  are  abstractly 
equivalent,  they  can  differ  greatly  according  to  the  nature  of  their  elements.  For  signal  processing 
applications  useful  Hilbert  spaces  can  be  considered  to  belong  to  one  of  three  main  types: 
spaces,  reproducing  kernel  spaces  (RKHS),  and  spaces  of  Hilbert-Schmidt  operators  acting  on  a 
fixed  underlying  Hilbert  space.  If  P  is  a  probability  measure,  L2(P)  is  the  corresponding  space  of 
random  variables  with  finite  variance.  If  G  is  a  finite  or  compact  topological  group,  L^(G)  is 
defined  with  respect  to  the  associated  Haar  measure  and  forms  a  Hilbert  space  of  exceptionally 
rich  structure  known  as  an  H*-algebra  [9].  The  possibility  of  using  such  a  space  as  a  setting  for 
signal  or  data  models  is  provocative  an  is  given  a  preliminary  look  in  Chapter  111.  By  contrast, 
reproducing  kernel  spaces  generally  contain  very  smooth  or  even  analytic  functions,  and  these 
elements  can  naturally  serve  as  models  of  unobservable  signals  about  which  some  information  is 
available  as  data  or  constraints.  Examples  are  the  various  Sobolev  spaces  whose  elements  are 
real-valued  functions  of  one  or  several  variables,  each  of  a  certain  fixed  degree  of 
differentiability,  Fock  spaces  of  (Volterra)  series  of  homogenious  polynomials  on  a  separable 
Hilbert  space,  and  Hardy,  Bergman,  and  Paley-Wiener  spaces  of  analytic  functions.  The  latter  are 
a  special  importar ,  e,  constituting  as  they  do  models  of  strictly  band'imited  signals.  Finally, 
hilbert-Schmidt  operators  generalize  matrices  normed  by  the  rule 

ll[aij)li'  =  5;iaij|2  , 

which  can  serve  as  models  of  2-dimensional  data.  On  an  space,  for  instance,  the  Hilbert- 
Schmidt  operators  are  just  the  integral  operators  defined  by  a  square  integrable  kernel.  In  general 
these  operators  are  compact  with  square-summable  singular  values. 

Now  Hilbert  space  theory  primarily  concerns  the  study  of  operators  acting  between  these 
spaces.  In  data  processing  operators  occur  both  in  the  preliminary  modeling  and  in  the  solution 
procedure.  Thus  we  often  assume  that  noise-free  data  occurs  as  the  value  assumed  by  some 
operator  at  the  unknown  signal.  The  operator  models  the  effect  of  a  communications  channel 
and/or  a  measuring  device,  perhaps  after  some  linearization  and  approximation.  For  example,  a 
satellite  detector  might  measure  upwelling  radiation  at  selected  frequencies.  This  radiation  at 
frequency  c  is  (approximately)  related  to  temperature  T  by  an  integral  operator  of  the  form 

1^(P)  =  /  K^(p,  q)T(q)dq 


7 


where  Op  is  the  volume  within  the  detector  field  of  view  when  the  subsatellite  point  is  p,  and  the 
kernel  is  determined  from  the  equations  of  radiation  transfer.  Here  we  think  of  the 

atmospheric  temperature  T  as  the  not-directly-observable  signal  of  interest  (perhaps  as  an  input 
to  some  weather  forecasting  program),  and  values  of  I  ^  as  the  data. 

On  the  other  hand,  with  the  exception  of  some  simple  detection  problems,  virtually  all  data 
processing  applications  involve  the  transformation  of  one  (received)  signal  into  another.  Hence  by 
restricting  attention  to  linear  transformations  and  assuming  that  the  data  can  be  considered 
mathematically  as  belonging  to  some  Hilbert  space,  a  powerful  and  general  theory  can  be  built 
up  from  the  already  available  operator  theory.  Let  us  therefore  agree  to  define  a  {linear)  data 
processor  to  be  an  operator  acting  from  the  data  or  sample  space  into  a  signal  or  model  space. 
Note  that  once  these  two  spaces  are  specified,  then  so  are  their  operators.  That  is,  the  operators 
exist  independently  of  any  particular  data  analysis  situation,  and  our  task  is  to  select  one  of  them 
that  best  fits  the  available  information  and  performance  criterion.  In  practice,  a  suboptimal 
choice  may  be  made  for  reasons  of  computational  efficiency.  This  theme  is  elaborated  on  in 
Chapter  III  (see  also  the  comments  on  the  ‘spline  algorithm’  in  Section  1.3). 

At  this  point  we  have  offered  some  general  motivation  for  the  use  of  linear  algebra/operator 
theory  and  probability/ statistics  for  signal  processing  models.  We  also  alluded  to  the  close 
connection  between  Hilbert  spaces  and  groups  through  the  representation  concept.  We  now  want 
to  conclude  this  section  with  an  attempt  to  delineate  the  basic  role  of  group  theory  in  signal 
processing.  More  detailed  discussions  of  specific  technical  issues  occur  throughout  the  rest  of  this 
report. 

It  is  commonplace  that  Fourier  techniques  (series  and  transforms)  have  been,  and  continue 
to  be,  of  decisive  importance  in  signal  processing  modeling.  The  h-storical  reasons  for  this  go 
back  to  the  study  of  lumped  linear  time-invariant  electrical  circuits.  The  linear  differential 
equations  relating  circuit  voltage  (or  current)  to  external  voltage  (or  current  source)  preserve  the 
frequency  of  a  sinusoided  input.  More  generally,  any  linear  time-invariant  system  has  the 
harmonic  exponent!  tls  t  e'"^‘  as  eigenfunctions,  for  appropriate  choice  of  K.  Hence  the  system 
response  to  any  (reasonable)  input  can  be  determined  from  the  Fourier  series  expansion  of  that 
input.  Fourier  methods  also  permit  solution  of  models  for  general  electromagnetic  radiation 
based  on  the  wave  equation.  More  recently,  there  has  been  the  explosive  development  in 
computer-based  methods  for  processing  discrete  data,  by  means  of  various  ‘fast’  algorithms  for 
the  discrete  Fourier  transform.  Sometimes  the  intent  is  to  simply  do  fast  digital  data  filtering, 
which  is  essentially  a  convolution  of  a  data  vector  with  the  fitter  impulse  response.  Alternatively, 
spectral  analysis  of  stationary  data  may  be  the  goal,  and  here  Fourier  methods  are  required  also 
of  the  very  definition  of  the  desired  quantity  (power  spectral  density  function). 

Now,  all  the  Fourier  transforms  and  expansions  just  alluded  to  are  merely  instances  of  a 
much  more  general  situation.  This  is  the  field  of  ‘abstract  harmonic  analysis’,  the  study  of 
functions  defined  on  locally  compact  groups  in  terms  of  the  associated  (unitary)  representations. 
We  can’t  seriously  contemplate  summarizing  this  field  which  has  been  under  active  development 
since  the  publication  of  Wtil’s  book  [101  in  1941.  The  first  English  source  for  this  material  is 
Loomis  [9],  followed  by  Rudin  [11]  and  the  encyclopediac  treatise  of  Hewitt  and  Ross  [12], 
among  others.  However,  we  will  offer  just  a  few  remarks  aimed  at  providii  a  little  perspective. 
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If  we  think  of  an  element  x  of  a  (separable)  Hilbert  space  as  a  mathematical  model  of  a 
signal  or  a  wave,  we  know  from  the  elementary  theory  of  such  spaces  that  x  can  be  expressed  as 
a  convergent  series  in  terms  of  an  arbitrary  orthonormal  basis  jUnj : 

X  =  2  Cniln, 

and  in  fact  the  coefficients  c„  are  specified  uniquely  by  the  inner  products  c^  =  <x,  2n>.  A 
problem  arises  in  view  of  the  arbitrariness  here:  there  is  generally  no  natural  way  to  carry  out 
this  decomposition.  If,  for  example,  x  is  a  function  defined  on  a  compact  interval  then 
permissible  choices  of|2n|range  from  scaled  trigonometric  functions  to  orthogonal  polynomials  to 
step  functions.  To  attempt  a  physical  analogy  we  might  say  that,  unlike  a  specific  device  (a  clock, 
a  car,  etc.),  a  wave  has  no  intrinsic  parts. 

A  partial  remedy  exists  if  there  is  some  additional  structure  available.  Suppose  in  particular 
that  our  Hilbert  space  is  L2(G),  where  G  is  some  unimodular  locally  compact  group  and  the 
integration  is  done  with  respect  to  its  Haar  measure.  (The  existence  of  a  positive  regular  Borel 
measure  on  a  locally  compact  group,  which  is  invariant  with  respect  to  left,  or  right,  translations, 
and  which  is  unique  up  to  a  constant  positive  multiple,  was  the  first  major  success  of  abstract 
harmonic  analysis  in  the  1930s.  Such  measures  are  called  left,  or  right,  Haar  measures,  and 
permit  the  notion  of  invariant  integration  over  G.  Groups  for  which  every  left  Haar  measure  is 
also  a  right  Haar  measure  are  termed  unimodular.  These  include  abelian,  discrete,  and  compact 
topological  groups,  as  well  as  semisimple  Lie  groups;  in  such  cases  we  may  speak  simply  of  the 
Haar  measure,  which  is  unique  up  to  a  normalization  constant.  This  latter  is  usually  chosen  so 
that  the  measure  of  G  is  1  when  G  is  compact,  and  so  that  the  measure  of  each  element  of  G  is 
I  when  G  is  discrete,  although  this  is  clearly  inconsistent  when  G  is  finite.  When  G  is  the 
additive  group  R",  the  Haar  measure  is  proportional  to  ordinary  Lebesque  measure.)  Suppose  in 
particular  that  G  is  compact.  Then  there  are  distinguished  orthonormal  bases  for  L2(G),  whose 
elements  are  of  the  form 

2(g)  =  <  U(g)y,  z  >  ,  g  t  G 

where  U  is  an  irreducible  unitary  representation  of  G  on  a  (necessarily  finite  dimensional)  Hilbert 
space.  If  G  is  separable  (an  inessential  restriction  for  practical  purposes),  there  are  only  countably 
many  inequivalent  irreducible  representations.  The  coiivsponding  expansion  of  elements  of  L2(G) 
in  terms  of  these  special  basis  elements  is  a  generalization  of  the  classical  Fourier  series,  to  which 
it  reduces  when  G  =  the  circle  group  (the  multiplicative  group  of  complex  numbers  of 
modulus  1).  Actually  the  classical  theory  is  a  bit  easier  because  the  circle  group  is  abelian,  so  that 
the  irreducible  representations  are  one-dimensional  (called  ‘characters’  in  this  case),  and  hence  can 
effectively  be  avoided.  The  basic  point  here  is  that  the  role  of  the  simple  harmonic  exponentials 
t  —  e‘"‘  for  the  circle  group  is  played  more  generally  for  any  com.pact  groups  by  its  irreducible 
representations.  This  collection  of  results  is  known  as  the  Peter-Weyl  theory. 
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A  final  remark  pertains  to  the  all-important  concept  of  the  Fourier  transform.  This  operator 
is  defined  for  every  f  e  L^fG)  by  the  rule 


f  (7)  =  J  f(g)  <  g,  7  >  dmG(g)  (1. 1) 

G 

if  G  is  abelian  and  7  is  a  character  on  G,  or  by 

f(X)  =  J'f(g)Mg-‘)dmG(g)  (1.2) 

G 

if  G  is  compact.  In  each  case  mG(’)  denotes  the  Haar  measure  on  G.  In  Equation  (I.l)  f  is  a 
continuou^  function  defined  on  the  dual  group  f,  which  consists  of  all  the  (continuous) 
characters  on  G,  while  in  Equation  (1.2)  f  is  defined  on  the  unitary  dual  object  P,  which  consists 
of  all  equivalence  classes  of  (continuous)  irreducible  unitary  representations  of  G.  Note  that  in 
this  latter  case  each  value  f(X)  is  an  operator  on  a  certain  finite  dimensional  Hilbert  space.  In 
each  case  f  is  termed  the  Fourier  or  group  transform  of  f,  and  uniquely  determines  f.  In  the 
abelian  case,  there  is  a  choice  of  Haar  measure  m  on  P  so  that  if  f  e  L'(P)  then  the  inversion 
formula 


f(g)  =  J*  <  g,  7  >  f  (7)  dmr(7)  (P3) 

r 

holds  a.e.  on  G.  (Usually,  in  fact,  f  is  contrained  to  some  class  of  smooth  functions  —  the 
Schwartz  class,  the  continuous  positive  definite  functions,  etc.  —  in  such  cases  Equation  (1.3)  is 
valid  on  all  of  G.)  Also  in  this  case  the  transform  can  be  defined  on  L2(G)  so  as  to  be  a  unitary 
with  range  L2(P).  In  the  compact  case,  with  proper  interpretation  of  L^fP),  the  transform  is  again 
a  unitary  operator.  These  statements  are  known  as  Plancherel’s  theorem,  and  can,  in  fact,  be 
extended  to  the  general  (separable)  unimodular  group  [13],  but  this  generality  is  not  required 
below. 

Since  many  of  the  classical  groups  are  abelian  the  abstract  group  (Fourier)  transform  defined 
by  Equation  (I.l)  extends  and  unifies  all  variants  of  the  Fourier  transform  that  occur  in  signal 
processing.  All  the  familiar  and  important  properties  of  the  classical  transforms  remain  valid. 
Thus,  with  the  proper  definitions,  translations  on  G  are  converted  into  multiplications,  and  the 
convolution  of  two  functions  in  L*(G)  has  a  transform  equal  to  the  product  of  the  individual 
transforms.  Also,  in  the  abelian  case,  there  are  a  variety  of  results  centered  around  the  duality 
formula 


(G/H)"=H^  (1-4) 

where  H  is  a  closed  subgroup  of  G,  and  its  annihilator  H-^  =|7tG:  <h,  7>  =  1,  heH| .  For 
instance,  the  Haar  measure  mQ  ^  on  the  quotient  group  G  'H  may  be  chosen  io  that 
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(1.5) 


J’f(g)dmG(g)  =  J  dniG/H  (^)  Jf(g  +  h)dmH(h) 

G  G/H  H 

a  result  known  as  Weil’s  formula.  In  Equation  (1.5)  is  Haar  measure  on  H  ai.d  ^  is  the  coset 
g  +  H. 

We  will  be  interested  in  the  following  generic  applications  of  the  Fourier  transform.  First,  it 
serves  to  diagonalize  convolution  operators  on  a  locally  compact  abelian  or  compact  group.  Such 
operators  are  discussed  briefly  in  Section  11.3  as  generalizations  of  time-invariant  filters,  and  in 
Chapter  111  as  group  filters.  Second,  on  the  finite  groups  of  Chapter  111  there  are  last  algorithms 
for  computing  the  Fourier  transform.  (Just  how  ‘fast’  such  algorithms  are  for  a  particular  group 
G,  depends  in  an  interesting  way  on  the  subgroup  structure  of  G;  the  determination  of  this 
structure  is  a  very  difficult  problem  in  general,  but  tractible  in  special  cases.  For  instance,  if  G  is 
abelian  the  issue  reduces  to  the  factorization  of  G  into  cyclic  subgroups  or,  in  an  another 
approach,  to  the  behavior  of  the  group  cha'acters  on  the  cosets  determined  by  the  members  of  a 
composition  series.  These  fast  transforms  can  in  turn  be  used  to  compute  group  filters  which,  in 
turn,  can  serve  as  suboptimal  approximants  to  Weiner  filters.  Further  discussion  is  given  in 
Chapter  111.)  Finally,  partial  Fourier  transforms  are  often  used  for  purposes  of  data  compression 
and  feature  extraction  for  pattern  recognition.  We  use  this  term  to  mean  any  linear 
transformation  T  on  L2(G),  where  G  is  a  unimodular  locally  compact  group,  of  the  form  given 
by  the  right  hand  side  of  Equation  (1.2),  and  where  X  is  any  unitary  representation  of  G.  Thus  if 
f  is  a  received  datum  which  can  be  considered  to  belong  to  L^fG),  then  T(0  is  a  statistic  whose 
value,  often  called  a  spectral  component  or  a  feature  of  f,  contains  information  about  the 
underlying  signal.  Thus  T(f)  could  be  used  as  a  basis  for  classifying  the  datum  f  into  two  or 
more  pattern  classes,  or,  if  the  dimension  of  the  represe,ntation  X  is  small  relative  to  that  of 
L*(G),  as  one  means  of  data  compression.  Note  that  for  the  practical  case  of  finite  dimensional 
data  we  have  many  choices  for  both  G  and  X,  so  this  approach  subsumes  many  special  cases. 

1.3  MATHEMATICAL  RESULTS  IN  SIGNAL  PROCESSING 

Earlier  we  suggested  the  decomposition  of  a  scientific  field  into  problems,  methods,  and 
results,  for  the  purpose  ol  obtaining  some  insight  into  its  activities.  At  this  point  we  have 
discussed  some  problems  and  methods,  at  a  fairly  abstract  level,  and  from  a  mathematical 
direction.  We  will  conclude  this  chapter  by  indicating  a  few  specific  results. 

Now  evidently  the  huge  literature  in  signal  processing  is  teeming  with  ‘results’,  so  we  are 
going  to  have  to  be  rather  choosy  here.  Before  presenting  our  brief  list  of  results  we  therefore 
indicate  the  criteria  for  their  inclusion. ’First  the  result  should  be  a  definite,  crisp,  and  nontrivial 
mathematical  theorem,  or  at  least  a  tightly  knit  collection  of  such.  Further  it  should  not  just  sit 
in  solitary  splendor  but  in  fact  it  should  have  engendered  significant  further  developments. 

Finally  it,  and/or  some  of  these  ensuring  developments,  should  be  widely  use^^  in  signal 
processing  practice.  We  also  acknowledge  the  subjective  nature  of  both  these  criteria,  and  of  the 
decision  as  to  whether  this  or  that  result  meets  all  of  them.  Such  subjectivity  is  a  recurrent  theme 
of  this  report. 
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Here  is  our  list  of  results,  followed  by  some  brief  comments. 

•  Karhounen-Loeve  expansion 

•  Spline  algorithm 

•  Maximum  entropy  principle 

•  Sampling  theorem 

•  Kolmogorov  isomorphism 

•  Fast  Fourier  transform 

Experienced  readers  will,  no  doubt,  consider  other  results  deserving  of  mention;  examples  might 
be  some  version  of  a  stochastic  filter  (Wiener,  Kalman,  . . .),  the  Levinson-Durbin  method  for 
fast  solution  of  Toeplitz  linear  systems,  etc.  We  simply  feel  that  at  least  one  of  our  three  criteria 
are  left  unsatisfied  by  other  results. 

The  original  K-L  expansion  (1947)  represented  a  stochastic  process  defined  on  a  compact 
interval,  with  continuous  covariance  function,  as  an  infinite  linear  combination  of  orthonormal 
functions,  with  mean  square  convergence.  This  had  the  practical  effect  of  ‘coordinatizing’  the 
process  by  the  countable  set  of  random  coefficients  which,  most  importantly,  turn  out  to  be 
uncorrelated  if  the  basis  functions  are  chosen  to  the  eigenfunctions  of  the  integral  operator 
defined  by  the  covariance  function.  Truncation  of  this  infinite  expansion  results  in  a  minimum 
mean  square  error,  for  a  fixed  number  of  terms,  and  also  minimizes  an  entropy  function.  Thus 
the  K-L  expansion  is  in  several  respects  an  optimal  way  to  decompose  (the  sample  functions  of) 
the  given  process.  Applications  to  signal  detection  soon  followed  (1950). 

Nowadays  there  are  many  generalizations.  The  simplest  is  to  replace  the  original  process  by 
a  second  order  probability  measure  on  a  Hilbert  space  and  to  expand  a  random  vector  in  the 
eigenvectors  of  the  associated  covariance  operator.  This  operator,  being  self-adjoint  and  nuclear, 
has  indeed  an  orthonomal  basis  of  eigenvectors,  and  the  resulting  expansion  converges  with 
probability  one.  When  the  Hilbert  space  is  finite  dimensional  there  are  a  variety  of  applications 
to  pattern  recognition  and  data  compression,  and  here  the  K-L  expansion  serves  as  a  benchmark 
for  the  performance  of  other  suboptimal  but  faster  data  processors  (see  Chapter  111  and,  for 
example,  [14]). 

A  different  direction  of  generalization  is  to  the  case  of  Gaussian  measures  on  a  general 
Banach  space;  for  example,  the  space  of  continuous  functions  on  a  compact  metric  space..  From 
such  work  we  learn,  inter  alia,  that  the  K-L  expansion  over  an  interval  converges  uniformly  with 
probability  one,  at  least  in  the  Gaussian  case.  For  one  such  result,  unifying  several  earlier  ones 
we  refer  to  [15]  where,  in  particular,  the  role  of  the  RKHS  associated  to  a  given  Gaussian 
measure  is  stressed. 

As  a  familiar  example  consider  classical  Brownian  motion  on  the  interval  [0,  1].  Its 
cova’'iance  function  is  min(s,  t)  for  0  ^  s,  t  ^  1  and  consequently  its  K-L  expansion  is  a  random 
Fourier  sine  series  with  zero-mean  independent  Gaussian  coefficients.  On  the  other  hand,  its 
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associated  RKHS  is  the  Sobolev  space  consisting  of  absolutely  continuous  functions  on  [0,  1], 
vanishing  at  0,  and  with  square-integrable  derivative.  Proceeding  in  this  direction  we  would  be 
led  to  the  classical  Wiener  measure  on  the  Banach  space  of  continuous  functions  C[0,  1],  as  the 
space  of  sample  paths  of  Brownian  motion,  and  then  to  the  more  recent  unifying  theory  of 
abstract  Wiener  spaces.  But  that  is  another  story  [16]. 

The  spline  algorithm,  as  we  are  using  this  term,  is  a  very  general  procedure  for  estimating  an 
unknown  element  in  a  Hilbert  space  from  partial  information  about  it.  The  desired  element  is 
often  construed  as  a  model  of  some  observed  phenomenon,  but  the  choice  of  the  underlying 
Hilbert  space  is  also  part  of  the  modeling  procedure.  In  the  simplest  (noise-free)  case  it  is 
assumed  that  a  finite  amount  of  linear  information  about  the  element  is  available,  along  with  a 
bound  on  its  norm.  The  optimal  estimate,  in  a  minimax  sense,  is  then  the  value  of  the 
pseudoinverse  of  the  data  operator  at  the  data  vector.  This  is  the  classic  prototype  of  a  linear 
data  processor  as  defined  in  the  preceeding  section.  The  optimal  estimate,  so  obtained,  is 
sometimes  called  an  abstract  interpolating  spline.  This  is  because  when  the  data  consists  of  a 
sampled  values,  and  the  Hilbert  space  is  the  Sobolev  space  of  smooth  functions  on  an  interval, 
with  square  integrable  second  derivative,  the  estimate  turns  out  to  be  the  (unique)  cubic  spline 
interpolant  of  the  data. 

The  early  results  on  spline  functions  (piecewise  polynomial  functions  joined  smoothly,  but 
not  analytically,  together)  are  due  to  Schoenberg  and  Sard  in  the  late  1940s.  The  Hilbert  space 
formalization,  originally  termed  the  ‘hypercircle  inequality’  because  of  its  geometric  interpretation, 
was  made  by  Golomb/ Weinberger  [17]  and  de  Boor-Lynch  [18].  During  the  last  15  years  this 
basic  result  has  evolved  into  two  dynamic  and  powerful  data  processing  methodologies:,  linear 
inverse  theory  (e.g.,  [19,  20])  and  the  theory  of  optimal  algorithms  [2]  already  mentioned  in 
Section  1.1. 

One  much-used  application  is  the  extrapolation  of  a  bandlimited  function  from  sampled  data 
or,  equivalently,  the  estimation  of  the  spectrum  (Fourier  transform)  from  such  data.  In  the  latter 
case  the  resulting  estimate  has  been  termed  the  ‘modified  discrete  Fourier  transform’  [21],  and  is 
used  with  over-sampled  (higher  than  Nyquist  rate)  data.  Another  application  occurs  in  the 
burgeoning  field  of  tomography,  where  the  reconstruction  of  cross-sectional  tissue  densities  is 
attempted,  based  on  the  observed  attenuation  of  a  finite  number  of  x-ray  beams  [22,  23]. 

The  Principle  of  Maxin.um  Entropy  (PME)  has  a  complex  and  controversial  history,  which 
is  reviewed  by  E.  Jaynes  in  [24],  for  example..  Involved  have  been  pioneers  in  the  foundations  of 
probability,  from  Laplace  to  Jeffries,  of  statistical  mechanics,  including  Boltzmann  and  Gibbs, 
and  of  information  theory,  especially  Shannon.  In  one  direction,  popularized  by  Jaynes  and 
S.  Kullback,  it  provides  a  systematic  way  of  estimating  probability  distributions  from  known 
constraints;  often  these  latter  are  certain  moments  of  the  distribution.  Indeed,  virtually  all 
standard  probability  distributions  can  be  so  derived  and  characterized.  PME  has  been  used  in 
statistical  decision  making  to  assign  probabilities  to  possible  outcomes,  and  therefore  to  permit 
business  and  economic  decisions.  Currently  PME  has  been  subsumed  by  PMCE,  the  Principal  of 
Minimum  Cross-entropy,  a  very  general  method  of  inductive  inference  wherein  a  probability 
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distribution  is  singled  out  as  ‘closest’  to  a  given  prior  from  within  the  class  of  all  distributions 
obeying  known  constraints.  Here  closeness  of  a  pair  of  distributions  p,  q  is  measured  by  their 
cross-entropy 


H(p,  q)  =  J  log(dp/dq)  dp,  (1.6) 

assuming  that  p  is  absolutely  continuous  wrt  q  (if  not,  H(p,  q)  =  +  o®).  Among  its  many 
consequences  PMCE  can  be  used  to  neatly  derive  the  classical  method  of  maximum  liklihood  for 
statistical  parameter  estimation  See  [25]  for  a  summary  of  the  methodology,  references  to  earlier 
work  and  applications  to  pattern  classification,  speech  processing,  image  enhancement,  and 
particularly  to  spectrum  estimation. 

It  is  this  latter  area,  of  course,  that  is  of  greatest  overall  sigi-ificance  in  signal  processing. 
Initially  PME  was  introduced  to  the  signal  processing  community  by  J.  Burg  in  1967  [26],  as  a 
new  procedure  for  estimating  the  power  spectrum  of  a  stationary  time  series  from  partial 
knowledge  of  its  autocorrelation  function.  Earlier  .  oproaches  implicitlv  assumed  this  function  to 
vanish  for  sufficiently  large  time  lags,  as  they  utilized  the  Fourier  trans‘"orm  of  the  product  of  a 
window  function  of  compact  support  with  the  known  or  estimated  autocorrelation  function. 

Burg’s  idea  was  to  view  the  spectrum  estimation  as  an  infinite  dimensional  optimization,  wherein 
the  entropy  of  the  process,  taken  proportional  to  the  integral  of  the  logaritl  m  of  the  spectral 
density  function  (the  underlying  random  process  being  assumed  Gaussian),  w,is  maximized  subject 
to  the  linear  constraints  imposed  by  the  known  values  of  the  autocorrelation  function.  The 
solution  turned  out  to  be  the  spectrum  of  an  autoregressive  process  of  an  order  equal  to  the 
number  of  constraints  less  1.  In  time,  a  better  mathematical  result  has  emerged,  one  that  removes 
both  the  stationarity  and  Gaussian  assumptions  [27]. 

Nowadays  the  maximum  entropy  method  is  seen  as  the  first  of  several  parametric  methods 
for  spectral  estimation,  each  involving  a  model  fit,  in  some  sense,  to  given  time  series  data.  These 
often  provide  superior  frequency  resolution  compared  with  classical  techniques.  Meanwhile 
research  based  on  PME  has  swung  in  the  direction  of  multivariate  spectrum  estimation  (see,  e.g., 
the  survey  [28]),  where,  even  with  uniform  sampling,  the  maximum  entropy  spectral  estimctte  is 
not  the  same  as  an  autoregressive  model  fit. 

The  remaining  three  of  our  distinguished  results  have  some  definite  group-theoretic  content 
and  are  consequently  d’ 'cussed  in  subsequent  sections  of  this  report. 
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II.  ASPECTS  OF  GROUP  THEORY  IN  SIGNAL  MODELING 

AND  SAMPLING 


Guided  by  the  philosophy  explained  in  the  preceding  chapter,  namely,  to  search  for 
mathematical  paradigms  and  to  justify  operations  on  data,  we  review  next  some  fundamental 
models  of  data  processing  with  main  emphasis  on  those  where  group  theory  plays  a  role.  We 
stress  that  the  attempt  here  is  to  describe  a  unified  and  systematic  approach  to  a  great  diversity 
of  problems.  Hence  a  recurrent  theme  will  be  that  even  if  a  group  is  not  immediately  apparent  in 
certain  situations,  it  may  be  useful,  in  the  above  sense,  to  try  to  uncover  a  group  ‘lurking’  in  the 
background,  or  even  to  impose  a  group  structure,  on  account  of  the  immense  body  of  theory  and 
technique  that  then  becomes  available. 

II.l  BASIC  ALGORITHMS  OF  DATA  PROCESSING 

We  begin  with  some  empirical  observations  concerning  the  practice  of  con.puterized  data 
processing.  Whatever  the  original  nature  of  the  data,  it  is  usually  subjected  to  a  series  of 
pre-processing  steps  that  serve  to  reduce  it  to  finite  dimensional  form.  Among  these  steps  might 
be  truncation,  discretization  and  sampling,  quantization,  etc.  This  reduced  data  is  called  a  block 
and  its  dimension  the  blocklength.  The  latter  is  determined  by  various  factors,  especially 
computer  storage  limitations,  and  the  physical  and  statistical  nature  of  the  original  data.  In 
particular,  successive  data  blocks  must  be  treated  as  independent  of  one  another,  and  only 
statistical  associations  between  components  of  a  block  can  be  considered. 

We  also  observe  that  most  data  processing  algorithms  involve  a  transformation  of  a  data 
block  into  another  block,  possibly  of  a  different  length.  Further,  these  transformations  are 
usually  linear,  perhaps  achieved  as  a  composite  of  several  relatively  simple  linear  transformations. 
This  is  not  surprising  given  the  highly  developed  theory  of  linear  transformations  vis-a-vis  any 
other  class  of  transformations.  We  will  continue  this  tradition  and  discuss  only  linear  data 
processors. 

Next,  we  observe  that  the  most  common  data  processing  algorithms  are  (variants  of)  the 
Wiener-Kalman  filter  and  the  fast  Fourier  transform  (FFT).  In  order  to  better  focus  our 
attention  we  will  continue  to  enforce  the  assumption  just  made  about  the  way  the  data  is 
presented  for  processing,  namely,  block  by  block.  This  will  eliminate  from  further  consideration 
the  Kalman-type  recursive  filters. 

The  term  ‘Wiener  filter’  is  used  generically  for  a  linear  transformation  of  the  data  chosen  so 
as  to  minimize  the  average  error  in  estimating  a  signal  contained  in  the  data.  In  the  present 
simple  situation  of  nonrecursive  block  data  we  can  v'rite 

y  =  s  +  T]  (11.1) 

(data)  (signal)  (noise) 

and  then  the  Wiener  filter  W  is  defined  by 
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Ell  s- Wy  l|2  =  min  .  (11.2) 

This  minimization  can  be  carried  out  by  standard  quadratic  optimization  techniques  in  the  space 
of  suitably  dimensioned  matrices,  with  inner  product  defined  by  <A,  B>  =  trace  (AB*);  the 
result,  for  zero-mean  signal  and  noise,  is 

W  =  CsyCp«  =  Q(Cs  +  C^)-J  .  (II.3) 

The  middle  term  in  Equation  (II. 3)  is  the  product  of  the  cross-covariance  matrix  of  the  signal 
and  the  data  with  the  inverse  of  the  covariance  matrix  of  the  data.  Under  the  conventional 
assumption  of  signal  and  noise  independence  (or  just  zero-correlation),  we  have  further 
Cy  =  Cj  +  C^,  as  indicated. 

A  major  extension  of  the  foregoing  model,  deserving  of  brief  mention  here,  is  to  the 
situation  where  the  unknown  signal  x  belongs  to  a  Hilbert  space  Hj,  is  transformed  by  a  linear 
operator  A  into  a  second  Hilbert  space  H2,  and  is  observed  there  in  the  presence  of  noise  process 
17,  modeled  as  a  zero-mean  weak  H-valued  random  variable.  Thus 

y  =  A(x)  +  r,  .  (II.4) 

The  operator  a  represents  the  effect  of  a  measurement  device  (probe)  and/or  a  communication 
channel.  Problems  leading  to  such  models  abound  in  optics,  geophysics,  biomedicine,  etc.,  where 
typically  x  is  a  function  representing  some  physical  variable  of  interest  across  a  continuum  of 
values  of  one  or  more  variables. 

Successful  estimates  of  x  in  Equation  (II.4)  by  means  of  a  linear  operator  B:H2~*Hi  will 
depend  on  proper  incorporation  of  prior  information  about  x,  either  deterministic  or  stochastic. 
Usually  the  operator  A  is  compact  (if  not  actually  of  finite  rank)  and  then  methods  involving 
pseudoinversions  and  singular  function  expansions  can  be  employed.  The  concept  of 
regularization,  to  compensate  for  the  ill-posed  nature  of  Equation  (11.4),  is  important  here..  But 
all  this  is  essentially  pure  Hilbert  space  theory  and,  so  far  at  least,  does  not  seem  to  have 
benefited  from  group  theoretic  techniques.  So  we  will  conclude  this  brief  excuision  by  noting  that 
the  problem  of  recovering  x  from  y  in  Equation  (II.4),  given  prior  information  or  constraints  on 
X,  is  what  we  mean  by  a  'linear  inverse  problem’,  and  that  selected  references  have  been  provided 
following  the  discussion  of  the  spline  algorithm  in  Section  1.3. 

Returning  now  to  our  theme  of  basic  algorithms,  we  next  discuss  the  FFT.  This  is,  of 
course,  just  an  accelerated  procedure  for  computing  the  discrete  Fourier  transform  (DFT).  The 
latter  amounts  to  multiplying  a  given  data  block  y  by  a  particular  unitary  matrix  F^,  where  N  is 
the  blocklength  of  y  and 


Fn  =  Wfyj"’"  ,  1  ^  m,  n  <  N  , 

with  Wfj  =  exp  (-27ri/  N).  Thus,  the  DFT  is  another  linear  transformation  which  can  be  applied 
to  data  vectors  of  arbitrary  (finite)  dimension.  However,  unlike  the  Wiener  filter  above,  which  is 
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defined  by  a  clearly  stated  purpose,  it  is  not  clear,  a  priori,  why  a  DFT  would  be  applied  as  part 
of  a  data  analysis  procedure.  Indeed,  the  DFT  is  defined  independently  of  any  assumptions 
concerning  the  nature  of  the  data.  In  fact,  as  we  shall  discuss  in  greater  detail  in  the  ne.xt 
chapter,  the  DFT  is  but  one  of  a  large  class  of  unitary  transforms  associated  with  groups  of 
finite  order.  These  are  the  so-called  group  (Fourier)  transforms,  the  formal  definition  of  which 
was  given  in  Section  1.2  for  locally  compact  abelian  or  general  compact  groups. 

Also  in  the  next  chapter  we  will  carefully  examine  the  rationale  for  taking  unitary 
transforms  of  data.  Roughly  a  unitary  transform  represents  the  data  vector  in  a  new  coordinate 
system  while  preserving  the  essential  information  contained  in  the  data.  Depending  on  the 
particular  goal  of  the  data  processing  we  may  expect  a  judiciously  chosen  unitary  transform  to 
reveal  hidden  features  of  the  data  (as  in  pattern  recognition  or  spectral  analysis),  or  to  result  in 
more  nearly  uncorrelated  coordinates  for  quantization  or  coding  purposes.  Although  there  is,  for 
any  signal,  an  optimal  unitary  transform  that  decorrelates  the  signal,  namely  the  discrete 
Karhounen-Loeve  transform  (DKLT),  its  practical  usefulness  is  limited  by  severe  computational 
difficulties,  as  well  as  possible  lack  of  knowledge  of  the  true  signal  statistics.  Thus  other  unitary 
transforms,  that  are  both  data  independent  and  computationally  efficient,  may  be  considered  as 
suboptimal  alternatives. 

In  addition  to  the  role  Just  described,  group  transforms  also  serve  as  components  of  a  class 
of  linear  transformations  called  group  filters.  These  transformations,  which  may  equivalently  be 
described  as  group  convolutions,  are  again  data  independent  and  may,  depending  on  the  internal 
structure  of  the  underlying  groups,  be  computationally  efficient.  Hence  group  filters  offer  the 
possibility  of  doing  fast  suboptimal  Wiener  filtering.  And,  since  convolutions  are  defined  on  all 
groups,  abelian  or  not,  we  see  that  any  (finite)  group  can  be  used  to  define  a  family  of  data 
processing  operations,  the  success  of  which  is  a  function  of  the  group  structure,  the  signal 
statistics  and,  of  course,  the  overall  purpose  of  the  processing. 

The  upshot  of  this  section  has  been  to  suggest,  in  addition  to  the  well  recognized  roles 
played  by  Hilbert  space/ operator  theory  and  by  probability/ statistics  in  signal  processing,  a 
significant  role  also  for  group  theory.  This  particular  role,  namely  the  use  of  group  transforms 
and  filters,  as  fast  and  convenient  approximations  to  a  variety  of  data  processing  tasks,  will  be 
further  described  in  the  next  chapter.  The  remainder  of  this  chapter  is  devoted  to  a  briel  resume 
of  several  other  aspects  of  group  theory  in  signal  processing.  Each  of  these  topics  deserves  more 
attention  than  can  be  provided  in  the  present  report.  Hence  they  will  be  introduced  merely  to 
buttress  our  theme  that  a  group-theoretic  viewpoint  is  a  fruitful  one  for  many  reasons  in  the 
design  of  signal  models  and  data  processors. 

11.2  STATIONARY  SIGNALS 

Random  data  is  usually  modeled  as  a  realization  of  a  stochastic  process.  Thus  a  general 
mathematical  task  is  to  define  and  study  relatively  simple  processes  whose  realizations,  or  sample 
paths,  can  replicate  observations.  Now  a  stochastic  process  consists  of  a  family  of  random 
variables,  whose  values  may  be  real,  complex,  or  higher  dimensional  vectors.  The  family  is  often 


19 


thought  of  as  indexed  by  an  integer  or  real  variable  representing  ‘time’,  but  other  index  sets  are 
not  uncommon.  For  example,  indices  may  comprise  a  set  of  2-  or  3-dimensional  points,  which 
correspond  to  the  geometric  distribution  of  sensors,  whose  outputs  are  the  data.  In  any  event,  it 
IS  important  to  distinguish  between  the  random  variables  which  together  define  the  process  and 
the  realizations,  considered  as  functions  on  the  index  set.  The  process  may  equally  well  be  viewed 
as  a  probability  distribution  on  various  function  spaces  over  the  index  set,  any  one  of  which  may 
be  called  the  ‘sample  space’.  (In  particular,  in  the  computerized  processing  of  the  preceding 
section,  the  data  can  be  considered  as  a  random  sample  from  a  finite  dimensional  distribution.) 
Both  these  distinct  views  of  a  stochastic  process  lend  themselves  to  applications  of  group  theory 
and  the  corresponding  harmonic  analysis.  For  the  next  couple  of  sections  we  will  emphasize  the 
former  view,  and  tl  e  switch  back  to  the  latter  (sample  space  distribution)  view.. 

Classically,  stationary  processes  were  proposed  as  models  for  signals  whose  statistical 
fluctuations  appeared  to  be  independent  of  time.  The  simplest  of  several  possible  precise 
definitions  is  that  the  process  mean  should  be  a  constant,  and  that  the  covariance  function 
evaluated  at  points  s,  t,  should  depend  only  on  the  difference  t-s.  Such  processes  are  called 
weakly  (or,  wide-sense)  stationary.  More  restrictive  definitions  of  stationarity  may  be  given;  these 
involve  invariance  to  time  shifts  of  higher  moments,  or  of  the  entire  family  of  finite  dimensional 
distributions.  However,  these  will  not  be  required  for  what  follows. 

As  far  back  as  1948  it  was  recognized  that  this  definition  involved  the  (additive)  group 
structure  of  the  integers  or  the  real  numbers,  according  as  time  was  represented  discretely  or 
continuously.  It  was  then  a  short  leap  (as  least  for  mathematicians!)  to  extend  the  definition  of 
weak  stationarity  to  processes  defined  on  any  locally  compact  abelian  (lea)  group.  Thus  if  G  is 
such  a  group,  and  Lq  (P)  is  the  space  of  zero-mean  second  order  random  variables  with  respect 
to  the  probability  measure  P,  we  say  that  the  (continuous)  mapping  g—Xg,  from  G  into  Lq  (P),  is 
a  weakly  stationary  stochastic  process  on  G  if 

E  (XglTj,)  -  E  (Xg+k^h+k) 

E  (Xg.jjXg) 

=  r(g  -  h) 

for  all  g,  h,  k  e  G,  and  e  =  identity  element  of  G;  the  function  r  just  defined  on  G  is  the 
covariance  function  of  the  process.  A  final  bit  of  abstraction  can  be  obtained  by  replacing  the 
space  Lq  (P)  by  an  arbitrary  Hilbert  space  H  and  then  defining  the  function  r  by 

<(Xg.  Xh>  =  <Xg.h,  r.^>  =  r(g  -  h)  .  (11.5) 

It  is  remarkable  that  so  much  structure  ensues  from  this  simple  hypothesis  of  weak 
stationarity.  First  of  all,  it  turns  out  that  this  concept  is  fundamentally  linked  with  that  of  a 
unitary  representation  of  G  in  H.  Indeed,  given  such  a  representation  U,  that  is,  a  (continuous) 
homomorphism  from  G  into  the  group  of  unitary  operators  on  H,  and  any  vector  x  t  H,  then 
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Xg  =  U(g)’X  is  weakly  stationary.  And  conversely,  if  Xg  is  a  given  weakly  stationary  mapping  of 
G  into  H,  with  covariance  function  r,  there  exists  a  unitary  representation  U  of  G  on  H,  and 
X  e  H  such  that  r(g)  =  <U(g)x,  x>.  These  facts  do  not  require  that  G  be  abelian,  and  will 
reappear  in  Section  III.3  in  the  context  of  general  finite  groups. 

Next,  it  follows  from  Bochner’s  theorem  that,  owing  to  the  positive  definiteness  of  r,  there  is 
a  finite  positive  (regular  Borel)  measure  n  on  the  dual  group  F  such  that 

r(g)  =  S  7(g)dM(7),  geG  .  (II.6) 

r 

This  n  is  called  the  spectral  measure  of  the  process  |xg|,  and  its  Radon-Nikodym  derivative  with 
respect  to  the  Haar  measure  on  F  is  called  the  spectral  density  of  |xg|. 

Equation  (II.6),  a  general  version  of  the  so-called  Wiener-Khinchine  relation,  connects  a 
‘time-domain’  concept,  the  covariance  function  r,  with  a  ‘frequency-domain’  construct,  the  spectral 
measure  /i.  In  particular,  it  exactly  locates  the  proper  domain  of  definition  of  the  latter  as  the 
(Borel  field  oQ  the  dual  group.  Thus  given  a  weakly-stationary  process  defined  on  any  group, 
discrete  or  not,  of  any  dimension,  we  know  where  to  set  up  a  frequency  domain  analysis.  In  fact, 
we  can  achieve  a  very  tight  yet  decoupled  relation  between  these  two  domains,  as  indicated  next, 
using  another  group  theoretic  result. 

Given  a  weakly  stationary  H-valued  process  |xg|,  geG,  we  can  make  the  association 

T:Xg  -  <g.  .> 

between  the  element  Xg  in  H  and  the  character  defined  by  g  on  F.  Because  the  spectral  measure 
H  is  finite,  the>e  characters  generate  the  space  L2(F,  h),  and  it  turns  out  the  T  can  be  extended 
to  an  isomorfhism  between  this  space  and  the  closed  subspace  of  H  spanned  by  |xg|.  This 
result  is  known  as  the  Kolmogorov  isomorphism,  since  Kolmogorov  orginally  proved  the  special 
case  where  G  is  the  group  of  integers.  In  this  way  we  can  replace  the  rather  mysterious  space  H^, 
in  practice  consisting  of  linear  functions  of  the  random  variables  |xg|,  by  a  more  familiar  function 
space  L2(F,  p). 

The  isomorphism  just  mentioned  can  be  explicitly  implemented  in  terms  of  a  stochastic 
integral,  which  can  in  turn  be  rather  easily  derived  from  a  generalized  version  of  Stone’s  theorem. 
This  generalization  states  that  any  (weakly)  continuous  unitary  representation  U  of  an  lea  group 
G  can  be  expressed  as  an  integral  of  a  ‘resolution  of  the  identity’  E  on  F: 

G(g)  =  S  <g-  7>  dE(7).  gtG  .  (11.7) 

F 

Here  E  is  a  strongly  countably  additive  measure  on  F  whose  values  are  orthogonal  projections  on 
H,  and  with  E(r)  =  1.  Thus  U  is  a  kind  of  abstract  Fourier  transform  of  the  projection-valued 
measure  E.  The  validity  of  Equation  (11.7)  proceeds  from  Bochner’s  theorem  and  some  general 
measure  theory. 
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It  follows  that  we  can  express  our  original  H-valued  process  |xg|  in  the  form 


Xg  =  U(g)Xe  =  J*  <g,  7>  dE(7)  •  Xj  , 

r 

where  now  the  integrator  is  the  H-valued  orthogonally  scattered  measure  W  whose  value  at  any 
Borel  set  B  C  r  is  the  vector  W(B)  =  E(B)  *  Xg.  When  H  =  (P)  this  expression  is  the 

stochastic  integral  just  mentioned,  and  defines  the  inverse  of  the  Kolomogrov  isomorphism; 

T(Xg)  =  <g,  •>  ; 

Xg  =  T->  «g,  •»  =  J*  <g.  7>  dW(7).  geG  .  (II.8) 

r 

Classically,  this  formula  is  Cramer's  representation  of  the  process  as  the  Fourier  transform  of  a 
random  measure  with  orthogonal  increments.  Finally,  the  connection  between  Equations  (II. 6) 
and  (II.8)  is  simple:  m  =  II  W(-)  jp. 

With  these  general  principles  established,  more  specialized  models  can  be  developed,  based 
on  the  integration  of  various  stochastic  measures.  For  example,  we  may  say  that  a  ‘white  noise' 
on  the  group  G  is  a  stochastic  measure  W  whose  associated  scalar  measure  =  ||  W(*)  |p)  is 
a  Haar  measure  on  G.  The  convolution  of  an  L^-function  4>  on  G  with  W  then  results  in  a 
certain  weakly  stationary  process 

Xg  =  J  d)(g  -  t)dW(t)  ,  gfG  ,  (11.9) 

G 

which  is  a  generalization  of  the  classical  ‘moving  aver  ge’.  These  latter  occur  when  G  is  the 
group  of  integers  Z  =  |n|;  the  white  noise  reduces  to  a  sequence  of  zero-mean  uncorrelated 
random  variables  |w„|  of  variance  1,  and  <t>  becomes  a  square  summable  sequence  |<^n|.  Then 
Equation  (11.9)  is  just 


OO 

Xn  =  SVkWk  ,  (II.  10) 

k  =  -00 

the  usual  moving  average  representation. 

In  general,  a  weakly  stationary  process  has  the  form  of  Equation  (11.9)  if  and  only  if 
its  spectral  measure  is  absolutely  continuous  wrt  Haar  measure  on  r[39]. 

A  more  leisurely  presentation  of  the  preceding  ideas  is  given  in  [1]  for  the  special  case  of 
discrete  processes,  that  is,  the  case  where  G  is  the  group  of  integers.  The  theory  of  (Fourier) 
analysis  on  lea  groups  is  succinctly  set  forth  in  [2,  Chapter  1].  The  general  theory  of  orthogonally 
scattered  measures  is  due  to  P.  Massani  [3];  a  brief  review  occurs  in  [1].  Second  order  weakly 
stationary  processes  were  first  defined  on  lea  groups  by  J.  Kampe  de  Feriet;  a  more  recent 
treatment  is  [4],  The  survey  paper  [1]  contains  numerous  further  references. 
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lU  TIME-INVARIANT  FILTERS 


Filters  are  devices  which  purposefully  modify  data  as,  for  instance,  the  Wiener  filter  already 
discussed  in  Section  II.  1.  Thus,  mathematically  speaking,  a  filter  can  be  viewed  as  simply  a 
(usually  linear)  transformation  of  data,  and  indeed  this  is  the  general  view  taken  in  this  report. 

But  traditionally  the  term  ‘filter’  is  used  in  a  more  specific  context,  to  denote  a  linear 
transformation  which  is  time-invariant  and  causal  (nonanticipating).  Thus  a  filter  appears  as  a 
special  type  of  operator  on  a  sample  space  for  the  underlying  process  whose  realizations  are  the 
possible  data.  As  such,  the  filter  is  defined  somewhat  independently  of  the  process;  the  main 
requirement  being  that  the  domain  of  the  filter  contain  the  path  space  of  the  process. 

Now  Wi.en  the  sample  paths  are  defined  along  the  real  line,  or  a  discrete  subset  thereof,  the 
filter  is  termed  ‘time-invariant’  if  a  shift  in  an  input  to  the  filter  is  preserved  in  the  output.  This 
notion  can  be  readily  extended  to  the  group  context;  if  G  is  an  lea  group,  and  T  a  filter  whose 
domain  is  a  translation-invariant  space  M  of  functions  defined  on  G,  the  T  is  called  invariant  if 

T  [u(' -  g)]  =  T(u)  (•  -  g)  ,  (II.  11) 

for  each  geG  and  utM.  That  is,  T  commutes  with  the  family  jrg  •  geGjof  translation  operators 
defined  by  the  elements  of  G:T  •  Tg  =  Xg  •  T,  where  by  definition  rg(u)(x)  =  u(x  -  g),  geG,  ueM.: 

The  mathematical  issues  here  include  the  following: 

(a)  For  a  given  translation-invariant  space  M,  what  is  the  structure  of  operators 
T  that  satisfy  Equation  (11.11)?  This  question  may  be  extended  to  include 
cases  where  the  range  of  T  lies  in  a  second  translation-invariant  space. 

(b)  When  does  there  exist  a  ‘frequency-response  function?’  Based  on  the  classical 
situation  this  should  be  a  (measuraole)  function  d>,  defined  on  the  dual  group, 
such  that  the  action  of  T  is  equivalent  under  the  group  (Fourier)  transform  to 
multiplication  by  4>-  A  fortiori,  the  space  M  of  (a)  is  now  L^fG). 

(c)  When  can  the  stochastic  process  |xg:geG|  which  is  generating  our  observations 
be  realised  in  L^fG),  or  in  some  other  translation-invariant  subspace  M  for 
which  the  answer  to  the  question  of  (a)  is  ‘interesting’? 

These  are  questions  difficult  to  answer  at  a  high  level  of  generality.  Most  of  the  available 
answers  (a)  and  (b)  have,  in  fact,  only  become  available  in  the  last  20  years,  primarily  from 
research  in  harmonic  analysis.  Here  we  just  indicate  a  couple  of  special  cases.  First  let  T  be  a 
continuous  linear  transformation  either  from  L’(G)  into  C(G),  or  from  L*(G)  into  L^fG),  and 
suppose  that  T  commutes  with  all  the  translation  operators  Tg  [here  C(G)  is  the  space  of  all 
continuous  functions  defined  on  G]  Then  T  is  convolution  with  a  fixed  function  4>\  that  is, 

T(0(x)  =  f*(/)(x)  =  J  fix  -  g)d)(g)dg  .  (11.12) 

G 
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This  was  established  in  [S]  along  with  several  other  similar  results. 

Let  us  next  consider  the  case  where  T  is  an  operator  on  L2(G)  that  commutes  with 
translations.  Here  the  situation  is  a  bit  murky,  but  the  basic  result  is  that  any  such  T  is 
convolution  with  a  ‘pseudomeasure’  on  G  [5,  7].  This  includes,  but  is  not  limited  to,  the  case 
where  there  is  a  bounded  measure  /n  on  G  for  which 

T(0(x)  =  f  *  m(x)  =  J  f(x  -  g)  dju(g)  . 

G 

Things  become  a  bit  clearer  (as  is  so  often  the  case!)  by  taking  a  group  (Fourier)  transform. 
Under  its  action,  suitably  defined  [5,  Section  IV],  pseudomeasurcs  corresponding  uniquely  to 
(locally)  essentially  bounded  functions  on  the  dual  group,  and  the  operation  of  convolution  with 
a  pseudomeasure  goes  over  into  multiplication  by  the  corresponding  function.  In  this  way  we 
obtain  a  general  answer  to  (b). 

A  similar  result,  but  allowing  T  to  be  only  closed  and  densely  defined,  later  obtained  in  [6], 
by  very  different  (operator-theoretic)  methods,  where  it  was  termed  a  ‘generalized  Bochner 
theorem’.  The  original  Bochner  theorem  (1929)  pertained  to  the  case  where  G  is  the  group  of  real 
numbers.  So,  the  upshot  is  that,  when  G  is  a  group,  any  invariant  filter  on  L^fG)  is  unitarily 
equivalent,  via  the  group  (Fourier)  transform,  to  a  bounded  multiplication  operator  on  L^(G). 

In  general,  any  operator  acting  between  a  pair  of  translation-invariant  function  spaces 
defined  on  an  lea  group  G,  and  commuting  with  the  translation  operators,  is  termed  a  ‘multiplier’ 
for  that  pair.  Thus,  for  example,  it  can  be  shown  that  the  correspondence  T—d*  defined  in 
Equation  (II.8)  above  is  actually  an  isometric  isomorphism  between  the  space  of  all  multipliers 
for  the  pair  [L’(G),  L^fG)],  and  the  space  L^fG).  Similarly,  the  space  of  multipliers  of  L2(G)  is 
isometrically  isomorphic  to  L‘“(G),  and  also  to  the  (suitably  defined  and  normed)  space  of 
pseudomeasures  on  G.  Which  form  of  the  multiplier  (invariant  filter)  we  use  depends  on  whether 
we  want  to  opera’t  in  the  ‘time  domain’  G  or  the  ‘frequency  domain’  1’.  The  abstract  theory  of 
multipliers  is  surveyed  by  Larsen  [7]. 

Finally,  as  to  question  (c),  we  observe  that  many  interesting  processes  on  G  cannot  be 
realized  in  L2(G),  although  there  is,  of  course,  no  problem  when  G  i>  of  finite  order  (the  case  of 
interest  in  the  next  chapter).  Otherwise,  we  have  a  simple  sufficient  condition,  under  a  mild 
measurability  restruction  on  the  process,  that  the  function  g-*var(Xg)  be  integrable.  It  is 
interesting  to  note  (for  some,  if  not  for  present  purposes)  that  a  (measurable)  second  order 
process  on  G  can  always  be  realized  in  some  weighted  L^  space  over  G  [8].  For  this  result  the 
group  structure  of  G  plays  po  role  and,  indeed,  such  spaces  need  not  be  translation  invariant. 

In  this  brief  review  of  sample  space  filtering  we  have  concentrated  on  understanding  the 
time-invariance  aspect  in  the  general  group  context.  Causality  depends  on  an  ordering  of  the 
group,  so  that  the  terms  ‘past’  and  ‘future’  have  a  meaning.  When  G  is  a  subgroup  of  the  real 
numbers  the  frequency  response  function  of  a  stable  casual  time-invariant  filter  can  be  extended 
to  an  analytic  function.  But  it  seems  to  be  somewhat  artificial  to  try  to  define  these  terms  in 
general,  so  we  will  not  pursue  the  matter  further. 
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A  second  point  oi  view  of  the  concept  of  a  filter  is  to  view  such  as  a  linear  transformation 
on  the  (Hilbert)  space  generated  by  the  random  variables  comprising  the  process  of  interest.  Thus 
let  jxgigeGj  be  a  weakly  stationary  H-valued  process  defined  on  the  lea  group  G.  As  before,  the 
subspace  of  H  spanned  by  the  Xg  is  denoted  Hx-  V/e  know  from  Section  II. 2  that  there  is  a 
unitary  representation  g-'U(g)  associated  with  this  process.  This  in  turn,  according  to 
Equation  (II. 7),  is  defined  by  a  resolution  of  the  identity  E(-)  on  f.  An  operator  T  on  Hx  might 
now  be  called  time-invariant  (sometimes  also  called  ‘U-stationary’)  if  it  commutes  with  all  U(g), 
geG.  A  question  of  long  interest  is  to  determine  the  structure  of  such  operators,  which  for  short 
we  may  now  refer  to  as  ‘filters’.  The  basic  goal  is  to  assert  that  T  is  an  E-integral;  that  is, 

T  =  J  d.(7)dE(y)  ,  (11.13) 

r 

where  </>  is  some  measurable  function  on  G.  This  equation  means  that  for  ytHx, 

T(y)  =  J'd>(7)clWy(7)  ,  (11.14) 

r 

where  Wy  is  the  orthogonally  scattered  Hx-valued  measure  defined  by  Wy(B)  =  E(B)  •  y,  B  C  G. 
Actually,  the  integral  in  Equation  (11.14)  can  only  be  defined  provided  that  (f)  t  LP-ifiy),  where  jUy 
is  the  spectral  measure  on  G  associated  with  Wy:  Hy  ~  ||  Wy(-)  |p..  This  requirement,  sometimes 
termed  the  ‘matching  condition’  in  the  signal  processing  literature  [10],  naturally  holds  when  d>  is 
(essentially)  bounded.  The  function  4>  may  now  be  termed  the  frequency  response  function  of  the 
filter  T.  Conditions  for  the  validity  of  Equation  (11.13)  are  provided  in  [6,  9];  they  involve  further 
restrictions  on  either  the  group  LI(g):g£G  or  on  the  operator  T.  The  simplest  of  these  conditions 
[6]  is  that  there  should  exist  a  cyclic  vector  in  Hx  for  U(g):geG  or,  in  other  words,  this  group  of 
operators  should  ha  /e  unit  multiplicity. 

If  yg  =  T(Xg),  geG,  represents  the  output  of  the  filter  T,  it  is  clear  that  jyg[  is  again  weakly 
stationary  with  the  same  shift  group  |U(g):geG|  as  |xg|.  The  value  of  the  representation  (11.13)  for 
T  is  that  it  permits  an  easy  but  rigorous  derivation  of  the  ‘frequency  domain’  behavior  of  the 
filter.  Namely,  if  Wx  and  Wy  are  the  orthogonally  scattered  measures  corresponding  to  |xg|  and 
yg ,  respectively,  then  it  follows  that 

Wy  (B)  =  J  <l>(y)  dWx  (7) 

B 

for  all  Borel  sets  B  C  P.  For  the  associated  spectral  measures  this  implies  that 

My  (B)  =  X  d>(7)  dWx  (7) 

B 
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and  consequently  that  d^y/d/Xx  =  ld>(')P'  Finally,  we  see  that  if  is  absolutely  continuous  wrt 
Haar  measure  on  G,  then  so  is  if  so,  the  corresponding  spectral  density  functions  f^  and  fy 
are  related  by 


fy  =  I ‘/>(-)  P  fx  .  ae  . 

This  is  a  fundamental  relationship  in  signal  processing  and  points  up  the  important  role  of  the 
function  \<f){ )  p,  known  as  the  ‘filter  gain’. 

Derivation  of  these  results  and  discussion  of  some  of  their  implications  is  given  in  [1].  We 
emphasize  once  more  that  we  have  indicated  two  complementary  approaches  to  rigorous  time- 
invariant  filter  definition  and  design  over  groups:  filtering  can  be  done  by  an  operator  either  on 
the  sample  space  or  on  the  space  generated  by  random  variables.  In  the  first  case  the  filter  is 
defined  independently  of  the  process;  otherwise,  it  is  defined  directly  in  terms  of  the  unitary  shift 
group  of  the  process,  which  is  assumed  weakly  stationary. 
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II.4  NONSTATIONARY  SIGNALS 


Unfortunately,  the  elegant  theory  of  weakly  stationar  signals  fails  to  encompass  many 
signals  of  interest.  Such  signals  may  exhibit  a  time-dependent  mean  (a  ‘trend’)  and  other  higher 
moments.  There  are  two  generic  approaches  to  this  real  difficulty:  transform  the  data  to  regain 
stationarity  or  recognize  the  data  as  belonging  to  a  broader  signal  class  for  which  some  (useful) 
structure  theory  exists.  The  first  approach  includes  such  filtering  tasks  as  detrending  and  general 
‘prewhitening’  of  the  data  along  with  statistical  tests  of  stationarity  applied  to  the  residuals.  The 
other  approach  is  of  greater  mathematical  substance  and  involves  defining  and  characterizing 
larger  classes  of  signals,  hopefully  retaining  some  of  the  useful  results  available  in  the  stationary 
class. 

This  transition  away  from  stationarity  bears  some  analogy  with  similar  movements  in  more 
familiar  settings:  from  linear  to  nonlinear  differential  equations,  say,  or  from  normal  to  non¬ 
normal  operators.  In  each  of  these  contexts  we  leave  behind  a  highly  developed  and  successful 
discipline  to  encounter  a  comparative  wilderness,  which  can  at  best  be  comprehended  by  a  variety 
of  special  cases,  techniques  and  approximations. 

A  succession  of  attempts  by  eminent  probabilists  (Loeve,  Karhounen,  Cramer,  Bochner, 
Rozanov,  Rao)  to  define  a  viable  extension  of  (weak)  stationarity  began  almost  40  years  ago  and 
has  culminated  in  the  concept  of  (weak)  harmonizability.  It  is  now  understood  that  this  concept 
may  be  approached  in  several  equivalent  ways,  each  resulting  from  a  generalization  of  the 
corresponding  construct  in  the  stationarity  theory.  Thus  one  may  attempt  to  extend  the  Weiner- 
Khinchine  relation  Equation  (11.6)  between  the  covariance  function  and  the  spectral  measure  of 
the  process,  the  Cramer  representation  Equation  (11.8)  of  the  process  as  the  Fourier  transform 
of  an  orthogonally  scattered  vector  measure,  or  the  operator  description  obtained  from 
Equation  (11.7)  and  the  concept  of  a  unitary  representation.  In  particular,  the  dependence  of  the 
covariance  functions  or  equivalently,  the  spectral  measure,  on  a  single  variable  must  be  relaxed. 
Yet  at  the  same  time  it  is  desirable  to  maintain  ties  with  Fourier  analysis  so  as  to  conserve  the 
frequency  interpretation  of  linear  filtering. 

This  is  not  the  place  to  delve  into  the  many  measure-theoretic  technicalities  required  to 
precisely  make  the  various  definitions  of  weak  harmonizability  and  to  establish  their  equivalence. 
For  such  details  the  recent  survey  of  M.  Rao  [II]  may  be  consulted.  Here  we  will  just  take  note 
of  a  few  highlights. 

Probably  the  most  straightforward  definition  is  that  an  H-valued  weakly  harmonizable 
process  on  an  lea  group  G  is  a  bounded  weakly  continuous  mapping  g-'Xg  from  G  into  H  of  the 
form 

Xg  =  J'<g,  7>  dW{7)  .  (11.15) 

r 

where  W  is  a  vector  measure  on  the  Borel  algebra  of  the  dual  group  F.  Thus  W  is  merely 
countably  additive  and  no  longer  necessarily  orthogonally  scattered  as  in  the  earlier  stationary 
case  Equation  (11.15)  is  a  very  general  Fourier  transform  relation,  so  that  this  new  concept 
conserves  some  link  with  harmonic  analysis. 
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One  immediate  consequence  of  this  definition  is  that  if  T  is  a  bounded  linear  operator  on  H, 
and  |xg:geG|  is  a  weakly  harmonizable  H-valued  process,  then  so  is  Vg  =  T(Xg).  Thus  this  new 
class  of  processes  is  closed  under  the  application  of  arbitrary  linear  operations,  a  situation  which 
definitely  does  not  obtain  for  the  stationary  processes  (or  even  for  other  more  restricted 
definitions  of  harmonizable  processes).  This  fact  has  the  implication  that  any  well-defined  linear 
filter,  time-invariant  or  not,  applied  to  a  weakly  harmonizable  input  yields  an  output  of  the  same 
sort,  an  observation  first  made  in  a  more  restricted  fashion  in  [12],  Hence,  for  systems  analysis 
purposes  this  is  as  wide  a  class  of  processes  as  needs  be  considered. 

A  second,  but  less  immediate  consequence  of  the  definition  is  the  following  expression  for 
the  covariance  function  R  of  the  process:, 

R(g,  h)  =  <Xg,  Xh>  =  J  J'<gr^>  F(d7,  dX)  ,  (II.  16) 

r  r 

where  the  integrator  set  function  F  is  defined  by 

F(A,  B)  =  <W(A),  W(b)>  ,  (11.17) 

for  A,  B,  Borel  subsets  of  F.  The  integration  in  Equation  (11.16)  can  be  a  little  tricky  since  F 
need  not  define  a  measure  on  F  X  F,  unless  it  is  of  bounded  variation.  However,  when  the  proper 
integral  is  used  (the  so-called  Morse-Transue  integral)  as  explained  in  [11],  then  the  covariance 
formula  Equation  (11.12)  provides  a  characterization  of  weakly  harmonizable  processes.  Even 
when  F  does  define  a  bona  fide  measure,  which  we  might  then  refer  to  as  the  spectral  measure  of 
the  process,  we  observe  from  Equation  (11.17)  that  it  will  generally  be  complex-valued,  unlike  the 
positive  spectral  measures  corresponding  to  the  weakly  stationary  processes.  Weakly  harmonizable 
processes  with  a  spectral  measure  an  now  commonly  referred  to  as  strongly  harmonizable  and 
are  often  encountered  in  the  engineering  literature  (primarily  for  the  case  G  =  jreal  numbers  | ; 

[12,  13]). 

If  jxgigfGj  is  a  weakly  stationary  H-valued  process,  and  P  is  an  orthogonal  projection  on 
H,  then  by  what  has  already  been  noted  the  process  |P(Xg):gtG[  is  weakly  harmonizable. 
Remarkably,  there  is  a  valid  converse  statement  which  provides  an  elegant  characterization  of 
weakly  harmonizable  processes  [11,  14].  Namely,  let  jygtgtGj  be  weakly  harmonizable  in  the 
Hilbert  space  H.  Then  there  exists  a  larger  Hilbert  space  K  and  a  weakly  stationary  K-valued 
process  |xg:geG|  such  that  yg  =  P(Xg),  gtG,  where  P  is  the  orthogonal  projection  from  K  onto 
H.  Thus  each  weakly  harmonizable  process  on  the  lea  group  G  appears  as  a  projection  of  an 
associated  weakly  stationary  process  on  G.  defined  in  an  enlarged  Hilbert  space.  Equivalently,  a 
weakly  harmonizable  process  can  alw.tys  be  ‘dilated’  to  a  weakly  stationary  process.  This  fact  is 
naturally  related  to  previously  known  results  concerning  unitary  dilations  of  contraction 
operators,  and  to  Naimark’s  theorem  concerning  the  dilation  of  a  positive-definite  operator 
function  on  G  to  a  unitary  representation  of  G  [15]. 

A  rather  different  approach  to  the  treatment  on  nonstationary  signals  also  originated  in  the 
mid  1940s,  and  continues  vigorously  into  the  present  time.  This  approach,  the  joint  time- 
frequenty  representation  of  signals,  emmanates  primarily  from  the  community  of  physicists  and 
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electrical  engineers,  and  has  associated  with  it  the  names  of  Wigner,  Gabor,  Ville,  Woodward, 
Rihaczek,  Cohen,  inter  alia.  The  analysis  and  design  of  radar  waveforms  [16,  17]  was  a  primary 
engineering  motivation.  More  recently,  this  work  has  been  applied  to  the  detection  of  phase- 
modulated  signals  in  noise  [18].  In  a  different  direction  efforts  have  been  made  to  absorb  some 
of  the  resulting  constructs  (such  as  the  radar  ambiguity  function)  into  a  general  mathematical 
framework,  specifically,  that  of  nilpotent  harmonic  analysis  [19,  20].  The  key  mathematical  object 
in  this  work  is  a  certain  Lie  group  known  as  the  real  Heisenberg  group. 

The  essential  idea  here  is  that  the  energy  of  nonstationary  signals  is  distributed  in  both  time 
and  frequency.  This  is  already  clear  with  audio  signals  where  both  the  pitch  and  the  time  of 
origin  of  a  tone  can  be  heard.  It  is  also  a  well-known  aspect  of  radar  signals  where  the  echo  is 
subjected  to  both  a  time  delay  and  a  frequency  (Doppler)  shift,  depending  on  the  range  and 
radial  velocity  of  the  target.  Hence  it  is  desirable  to  express  the  signal  as  a  function  of  both  time 
and  frequency.  Without  trying  to  be  overly  detailed  we  next  indicate  two  generic  approaches  to 
this  problem. 

As  we  know  from  Section  II. 2  the  spectral  content  of  a  weakly  stationary  process  is 
independent  of  the  time  index.  In  the  standard  cases  of  engineering  practice,  this  index  runs 
through  either  the  group  of  real  numbers  or  a  discrete  subgroup  thereof,  in  such  cases  consistent 
estimates  of  the  spectral  density  function  of  the  process  are  conventionally  obtained  as  the 
squared  magnitude  of  the  Fourier  transform  of  a  suitably  windowed  segment  of  a  sample 
function  of  the  process,  (Naturally,  an  ergodic  hypotheses  must  be  invoked  here  for  the  validity 
of  such  inferences.)  The  frequency  resolution  of  the  resulting  estimates  then  depends  on  the 
window  length.  Now  when  the  signal  is  not  stationary  we  may,  on  the  onv*  hand,  try  to  select  a 
window  of  sufficiently  short  length  that  the  signal  portion  under  this  window  is  approximately 
stationary.  In  this  way  we  are  led  to  a  compromise  between  resolution  in  time  and  in  frequency. 
Sliding  a  given  window  along  the  data  then  permits  a  display  of  the  variations  in  frequency  as  a 
function  of  time.  Such  joint  functions  of  time  and  frequency  are  called  ‘spectrograms’  [21]. 

A  second  approach  to  nonstationary  signal  analysis  is  to  attempt  a  direct  definition  of  a 
function  of  time  t  and  frequency  w  which  somehow  measures  the  distribution  of  signal  energy 
over  the  (t,  w)  plane.  There  arc  several  desiderata  here.  The  correspondence  between  a  signal  f 
and  its  proposed  distribution  F  should  ideally  be 

(1)  bilineai  in  f 

(2)  non-negative 

(3)  possessed  of  correct  marginals. 

This  last  requirement  refers  to  the  result  of  integrating  F  over  all  t  or  over  all  w. 

J  F(t,(o)dw  =  |f(t)|-  .. 

J  F(t,w)  dt  =|MI‘  . 
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where  f  is  the  suitably  normalized  Fourier  transform  of  f.  Unfortunately  it  is  not  possible  to 
satisfy  all  these  three  desiderata  simultaneously.. 

At  present  there  are  many  choices  available  of  what  might  be  called  ‘pseudodistributions’, 
that  is,  functions  F  obeying  some  of  the  above  criteria.  For  instance,  liie  rule 


F(t,  £0)  = 


1 


00 


J* e-iflt-iT<o+i0u 

.00 


k(0.  T) 


f(u  +  rj2)  f(u  -  T/2)dudTd6 


(11.18) 


where  k  is  a  rather  arbitrary  kernel  subject  to  k(0,  r)  =  k(0,  0)  =  1,  has  the  correct  marginals,  and 
is  bilinear  in  f  provided  that  k  is  independent  of  the  signal  f.  These  distributions  are  said  to 
constitute  Cohen's  class  [22j.  Special  choices  of  k  yield  many  popular  distributions,  Including 
those  of  Wigner  (take  k  =  1)  and  Rihaczek  [take  k(0,  t)  =  cos(6t/2)].  The  Wigner  distribution  is 
of  interest  for  several  reasons:  its  2D  Fourier  transform  is  the  familiar  ambiguity  function  of 
radar  theory  and  any  member  of  Cohen’s  class  can  be  obtained  from  it  by  convolution  with  an 
appropriate  measure.  Further,  in  a  certain  technical  sense,  the  Wigner  distribution  comes  the 
closest  from  Cohen’s  class  to  being  non-negative  [22].  In  addition  to  these  somewhat  theoretical 
reasons  the  Wigner  distribution  has  been  applied  to  a  variety  of  practical  data  processing  tasks; 
see  [18]  and  the  thesis  [23]  together  with  its  references. 

The  study  of  the  many  properties,  interrelations  and  applications  of  these  functions  is  a 
major  occupation  of  modern  signal  processing  workers.  Extention  of  these  aspects  from  the  real 
line  to  its  discrete  subgroups  has  begun,  and  in  time  we  may  expect  the  theory  to  slowly  progress 
to  signals  defined  over  more  general  lea  groups. 

In  addition  to  the  aspect  just  suggested  of  one  possible  role  of  group  theory  in  joint  time- 
frequency  signal  analysis,  there  is  another,  both  more  profound  and  less  expected.  Restricting  our 
attention  for  the  remainder  of  this  section  to  signals  defined  on  the  group  R  of  real  numbers,  we 
are  making  reference  to  the  appearance  of  the  real  three-dimensional  Heisenberg  group  H. 
According  to  Schempp  [19],  this  group  . .  stands  at  the  crossroads  of  quantum  mechanics  and 
signal  theory.”  Again,  without  intending  to  be  overly  detailed,  we  indicate  a  little  of  the  relevant 
background. 

The  mathematical  embodiment  of  the  unc.j»tainty  principles  of  conjugate  quantities  in 
noi.relativistic  quantum  mechanics  occurs  both  as  Heisenberg’s  inequality  for  Fourier  transforms 
of  functions  in  L2(R): 

00  00 

J  t21  f(t)|  2  dt  •  J'a;2|f(a>)|2d(u^Ef/167r2  (11.19) 

.00  .00 
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where  Ef  =  ||f||^,  and  as  the  commutativity  relation 

PQ  -  QP  =  cl  (11.20) 

for  a  corresponding  pair  of  self-adjoint  operators  P,  Q  and  a  pure  imaginary  constant  c.  In  the 
classical  single  particle  case,  P  and  Q  are  the  position  and  momentum  observables,  and  c  =  h/2;ri. 
h  =  Planck’s  constant.  The  Heisenberg  inequality  then  expresses  the  impossibility  of  exact 
simultaneous  measurement  of  both  these  quantities.  It  does  this  by  the  interpretation  of 
Equation  (11.19)  as  a  lower  bound  on  the  product  of  the  variances  of  the  observables  P  and  Q  in 
any  state  f.  (When  Ef  =  1,  the  function 

B-J’|f(t)|2dt 

B 


defines  a  probability  measure  on  the  real  line  R,  and  is  considered  to  define  the  probability  that 
our  particle  is  found  in  the  Bore!  set  B.) 


The  relation  (11.20)  was  given  a  group  theoretic  interpretation  long  ago  by  Weyl,  who 
replaced  the  operators  P  and  Q  by  the  one-parameter  groups  on  unitary  operators  which  they 
generate.  Further,  a  connection  was  made  with  the  Heisenberg  group  H]  already  mentioned.  By 
definition,  H]  is  the  subgroup  of  all  3  X  3  real  matrices  of  the  form 


g  = 


T  a  c' 
0  1  b 
.0  0  1. 


(11.21) 


As  such.  Hi  is  both  noncompact  and  nonabelian.  It  turns  out  that  there  is  a  deep  connection 
between  the  group  version  of  Equation  (11.20)  and  corresponding  commutativity  relations  between 
the  one-parameter  subgroups  of  H  obtained  from  the  matrices  Equation  (II.2I)  by  fixing  two  of 
a,  b,  c  at  zero.  From  this  one  can  eventually  obtain  a  description  of  the  irreducible  unitary 
representations  (of  dimension  >  1)  of  Hi  on  L^fR).  These  have  the  form 

UA(g)f(t)  =  ei^(c+ta)  f(t  +  b)  ,  f  e  L2(R)  ,  (11.22) 

for  geHi  and  for  any  real  X  #0.  In  particular,  Ui  is  often  called  the  linear  Schrodinger 
leprestntation  of  Hi.  For  this  background  material  we  refer  to  [24,  25]. 

Now,  even  though  there  is  no  analogue  of  Planck’s  constant  in  signal  theory,  there  is  a  well- 
known  uncertainty  principle  that  applies  to  radar  measurements.  It  essentially  places  limits  on 
achievable  resolution  performance  in  range  and  range  rate.  This  principle  can  be  arrived  at 
through  an  analysis  of  the  ambiguity  function  A.  As  already  noted,  A  is  the  Fourier  transform  of 
the  Wigtier  distribution  W,  a  special  case  of  Equation  (11.18).  Up  to  an  inessential  phase  factor 
the  ambiguity  function  is  obtained  by  cross-correlating  a  given  signal  with  its  time  and  frequency 
shifted  version: 


00 

A(r,  w)  =  J  f(t  +  y)  f  (t  -  y )€'**“'  dt 
.00 


(11.23) 
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Unlike  the  real-valued  W,  this  function  may  be  complex-valued.  Nevertheless,  we  always  have 
|A(t,(v)|  <  A(0,0)  , 

and 


||A||2  =  A(0,0)  =  £f  , 

indicating  a  constant  volume  under  the  surface  |  A{t,  to)  | ,  independent  of  the  signal  f.  This 
implies  the  impossibility  of  concentrating  A  (or  W,  for  that  matter)  around  a  particular  point,  as 
would  be  desirable  for  separating  closely  spaced  targets.  In  general,  it  follows  that  separability  in 
one  variable  is  only  to  be  gained  at  the  expense  of  self-cluttir  and  masking  in  the  other  variable. 

Now,  rewriting  the  ambiguity  function  A  in  Equation  (11.23)  as 

OG  __ 

A(t,  <o)  =  </>  f  f(u  +  r)  f(u)  e-'""  du 

(I0I  =  1)  and  comparing  with  the  representation  formula  (11.22),  we  see  that 

A(t,<o)  =  <U,(go)f,f>  .  (11.24) 

Here  go  is  restricted  to  elements  of  the  form  (11.21)  with  c  =  0  (and  with  the  identifications  a  =  -to, 
b  =  r),  and  the  bracket  notation  on  the  right  hand  side  refers  to  the  inner  product  in  L^fR), 
Intuitively,  we  think  of  f  as  the  envelope  of  a  radar  pulse  of  finite  energy  and  Equation  (11.24) 
expresses  the  cross  correlation  of  the  pulse  with  its  echo. 

The  foregoing  relation  (11.24)  is  the  basic  connection  between  the  theory  of  radar  waveform 
design  and  hanhunic  analysis  on  the  real  Heisenberg  group.  For  a  detailed  survey  of  this  link 
and  its  many  consequences,  one  may  consult  the  papers  of  Schempp  [27  cited  literature],  and 
[20].  Unfortunately,  the  mathematical  prerequisites  for  a  careful  development  of  this  material  are 
rather  severe. 

The  Heisenberg  group  concept  is  somewhat  more  general  and  ubiquitous  than  might  be 
inferred  from  the  preceding  remarks.  Given  any  commutative  ring  with  identity  or  any  lea  group, 
an  associated  Heisenberg  group  can  be  defined.  Thus,  in  the  ring  case,  we  can  use  the  matrices  of 
the  form  (11.21),  with  a,  b,  c  ring  elements.  Or,  in  the  case  of  an  lea  group  G,  we  can  take  the 
set  G  X  r  X  T  with  multiplication 

(g,  g.  s)  •  (h,  h,  t)  =  (g  +  h,  g  +  h,  st  <  g,  h  >) 

as  the  associated  Heisenberg  group;  here  T  is  the  circle  group.  The  resulting  construct  plays  a 
key  role  in  many  aspects  of  the  accompanying  harmonic  analysis  [26-30]. 

For  an  example,  when  G  is  a  (separable)  lea  group,  the  group  (Fourier)  transform 
Fg:L2(G)— can  be  shown  to  intertwine  two  irreducible  (unitary)  representations  of  H  on 
L2(G)  and  on  L2(r);  here  H  is  the  Heisenberg  group  associated  with  G.  This  essentially 
characterizes  Fq  and  leads  to  a  factorization  of  Fq  into  a  product  of  three  unitary  operators. 
When  specialized  to  the  finite  group  Z/rs,  where  Z  is  the  group  of  integers  and  r,  s  are  integers 
>  1,  the  Cooley-Tukey  FFT  algorithm  is  obtained  [28,  29].  This  algorithm  is  also  derived  in 
Section  III.4  below  by  a  more  direct  group-theoretic  procedure. 
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II.5  SAMPLING  THEOREMS 


Of  the  relative  handful  of  significant  mathematical  theorems  inspired  by  signal  processing 
requirements,  the  most  prominent  is  the  so-called  Sampling  Theorem.  Actually  there  is  now  a 
whole  genre  of  results,  loosely  called  sampling  theorems,  that  pertain  to  the  recovery  of  either 
deterministic  or  random  signals  from  certain  discrete  information  (‘samples’).  A  rather  complete 
survey  of  these  results  (up  to  1977)  has  been  given  by  Jerri  [31].  Here  we  shall  just  look  at  a 
couple  of  prototypes  in  order  to  once  again  point  out  a  group-theoretic  setting. 

For  a  deterministic  signal  the  intuitive  idea  is  that  complete  specification  of  the  signal  from  a 
sequence  of  equally-spaced  sample  values  is  possible,  provided  that  the  signal  does  not  vary  too 
rapidly.  This  last  phrase  is  taken  to  mean  that  the  frequency  spectrum  of  the  signal  is  eventually 
zero.  The  relation  between  an  upper  bound  on  the  spectrum  and  a  sample  spacing  sufficient  for 
recovery  is  a  reciprocal  one:  if  the  spectrum  vanishes  outside  the  interval  [-t  r,  then  the  signal  is 
uniquely  specified  by  its  set  of  sample  values  taken  at  the  points  |ka:k  =  0,  ±1,  ±2, , .  ,  where 

0  <  a  <  tt/  T.  In  fact,  we  have  the  famous  sampling  formula 


Tliis  formula  is  associated  with  the  names  of  Cauchy  (1841),  Whittaker  (1915),  Nyquist  (1928), 
Kotelnikov  (1933),  and  Shannon  (1949),  who  introduced  and  rediscovered  it  in  various  conte.xts. 
In  addition  to  its  well-known  utility  in  communications  theory  (A/D,  D/A  conversion),  formula 
(11.25)  serves  as  the  basis  for  a  variety  of  numerical  approximation  procedures.  In  this  setting  the 
formula  is  known  as  the  cardinal  series  expansion  of  f;  see  the  survey  by  Stenger  [32]. 

Proofs  of  the  sampling  formula  are  Fourier-analytical  in  nature.  The  nicest  one  proceeds 
from  the  Plancherel  theorem,  which  establishes  that  the  Fourier  transform  f— f  can  be  extended 
from  L'(R)  H  L^IR)  to  be  a  unitary  operator  on  L2(R).  (This  theorem,  of  course,  remains  valid 
if  R  is  replaced  by  an  lea  group.)  The  so-called  Paley-Wiener  space  PWE^  then  is  defined  as  the 
image  of  the  (inverse)  Fourier  transform  of  the  subspace  of  L2(R)  consisting  of  those  functions 
which  vanish  ae  outside  of  [-t,  t].  As  is  well  known,  the  space  PWE^  is  that  subspace  of  L-(R) 
whose  elements  f  can  be  extended  to  the  complex  plane  so  as  to  be  entire  functions  of 
exponential  growth  there: 

PWEr=  jfeL2(R)jf(z)Kc  exp  (rlzl)[ 
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This  Hilbert  space  of  analytic  functions,  equipped  with  the  reproducing  kernel 

K(z,  w)  =  -^  sine  r(z  -  w)  ,  (11.26) 

is  thus  rich  in  structure  and  its  elements  serve  as  mathematical  models  of  ‘bandlimited  signals.’ 

The  convergence  in  Equation  (11.25)  results  from  the  fact  that  the  shifted  sine  functions  are 
the  (inverse)  Fourier  transform  of  the  standard  expotential  basis 

«p  (i7rkt/r):k  =  0,  ±1,  ±2,  . .  .j- 

on  the  interval  [-r  r].  Because  this  basis  is  orthonormal  and  the  Fourier  transform  is  unitary,  the 
expansion  (II.2S)  is  just  an  expansion  in  an  orthonormal  basis  in  PWE^.  However,  by  making 
use  of  the  reproducing  kernel  properties  of  the  space  PWE^,  one  can  also  see  that  the 
convergence  in  Equation  (11.25)  is  actually  uniform  on  horizontal  strips  in  the  complex  plane. 

None  of  the  foregoing  analysis  directly  involves  any  group  theory.  Yet  in  view  of  the 
translations  evident  in  formula  (II.2I),  we  might  expect  that  a  group  theoretic  version  should 
exist.  Certainly  it  is  not  particularly  difficult  to  extend  the  sampling  formula  to  several  variables, 
that  is,  to  produce  a  formula  analogous  to  (11.25)  that  is  valid  for  bandlimited  functions  on  R", 
n  ^  1  [33].  [Actually,  such  generalizations  date  back  to  work  of  Parzen  (1956)  and  Peterson  and 
Middleton  (1962).]  And,  in  fact,  as  we  shall  indicate  momentarily,  a  rather  complete 
generalization  of  the  formula  exists  for  any  lea  group  [34], 

Before  doing  so,  however,  we  briefly  consider  another  type  of  extended  sampling  formula  by 
asking  whether  it  is  possible  to  replace  the  sine  function  in  Equation  (11.25)  by  other  functions. 
That  is,  we  are  looking  for  expansions  of  the  form 

00 

f(t)=  2  .  (»-27) 

k=-« 

(again  valid  for  f«L2(R),  with  the  constraint  that  f  have  compact  support.  Convergence  in 
Equation  (11.27)  should  at  least  be  in  the  metric  of  L^fR)  and  perhaps  be  uniform,  if  we  are 
lucky.  This  kind  of  sampling  formula  might  be  more  useful  than  the  traditional  one  because  of 
more  favorable  behavior  of  the  function  d>.  For  example,  the  smoother  0  is,  the  more  rapidly  <t> 
will  decay,  and  then  fewer  terms  on  the  right  side  of  Equation  (11.27)  will  be  needed  to 
adequately  approximate  f(t). 

The  key  condition  for  an  expansion  of  this  form  to  hold  is  that  of  reciprocal  relationship 
between  the  size  of  the  support  of  f  and  sample  spacing  a.  Suppose  that  we  can  find  open  sets 
V,  W  satisfying 

supp(f)  C  V  C  V  C  W 
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and  that  W  is  so  small  that  the  translates 
I  (w  +  2irkla:k  =  0,  ±1,  ±2,  . . 

are  pairwise  disjoint.  Then  there  exists  an  infinitely  differentiable  function  4/  such  that  i/f  =  1  on  V 
and  i/f  =  0  outside  of  W.  One  can  now  show  the  following  [35]: 

(a)  the  measure  of  W  is  finite; 

(b)  hence  f  e  L*  Pi  L^fR)  and  so  f  must  be  continuous; 

(c)  with  <i>  the  inverse  Fourier  transform  of  i/r,  the  series  in  (11.23)  converges 
uniformly  to  f. 

Returning  now  to  the  problem  of  extending  the  classical  sampling  formula  (II.2S)  to  a  group 
context,  we  let  G  be  an  arbitrary  lea  group.  The  role  of  the  sampling  points  |k7r/r[  will  be 
played  by  a  discrete  subgroup  H  of  G,  and  the  goal,  for  a  given  f  e  L^fG),  is  an  expansion  of  the 
fo'.m 

f(g)=  ^  f(h)d)(g-h)  (11.28) 

h<H 

for  a  suitable  function  <t>.  As  is  to  be  expected,  the  matter  depends  on  the  size  of  the  support  of 
the  group  (Fourier)  transform  f.  We  want  it  to  be  small  relative  to  H,  and  we  make  this  precise 
by  introducing  the  annihilator  A  of  H:  A  =  e  r:<h,  7>  =  1,  h  e  h| .  As  an  example, 

when  G  =  R  and  H  =  a  Z  for  some  fixed  a  >  0,  then  F  =  R  and  A  =  (2irla)Z  (here  Z  is  the 
group  of  integers).  Now  we  shall  require  that  f  vanish  ae  outside  of  an  open  set  0  which  has  the 
property  that  its  translates  |  fi  +  y:y  <  A  (  are  pairwise  disjoint.  With  this  setup  one  can  now 
establish  the  following,  originally  proved  in  [34]  under  the  further  assumption  that  the  subgroup 
A  also  be  discrete: 

(a)  the  (Harr)  measure  of  0  is  a  positive  number  <  <»; 

(b)  the  restrictions  of  the  character  functions  <h,  •>,  h  e  H,  to  0  form  an 
orihonormal  basis  for  1^(0); 

(c)  hence,  if  the  function  d)  is  defined  as  the  inverse  Fourier  transform  of  the 
characteristic  function  of  0,  then  <t>  is  continuous  and  positive  definite,  and 
the  set  of  translates  |)3*'0(-  -  h):h  e  h|  is  an  orthonormal  set  in  L^fg). 

(d)  f  is  continuous  and  the  expansion  (11-28)  holds  both  uniformly  on  g  and  in 
the  metric  of  L^fG). 

Thus  this  argument  basically  parallels  that  already  available  for  the  classical  case  here  G  =  R.  But 
it  reveals  a  little  more;  for  instance,  that  a  function  on  R  may  have  an  unbounded  spectrum  and 
yet  still  be  completely  determined  by  its  values  at  the  discrete  sampling  instants  |ka| . 

Finally,  we  want  to  indicate  that  analogous  results  are  valid  for  random  signals  defined  on 
groups.  For  stationary  processes  defined  R,  the  corresponding  sampling  formulas  go  back  to 
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Balakrishnan  [36]  and  Lloyd  [37].  In  general,  if  |xg:geG|  is  a  weakly  stationary  process  on  an 
lea  group  G,  as  defined  in  Section  II.2,  and  H  is  a  closed  subgroup  of  G,  we  say  that  jxg  |  is 
determined  by  its  samples  jx^rheHl  if  the  Hilbert  space  is  the  closed  linear  span  of  jxh| . 
Then  one  can  show  that  if  the  support  of  the  spectral  measure  of  the  process  |xg  |  has  the 
property  that  its  translates  by  the  members  of  the  annihilator  of  H  are  pairwise  disjoint,  the 
process  is  determined  by  its  samples  along  H  in  the  above  sense.  This  result  can  in  turn  be  fairly 
easily  extended  to  cover  (strongly)  harmonizable  processes  on  G  as  defined  in  Section  II.4  [35]. 

In  the  classical  case  (G  =  R,  H  =  aZ)  much  effort  has  been  devoted  to  establishing  sampling 
formulas  similar  to  (11.25)  or  (11.27)  that  would  converge,  or  at  least  be  summable,  in  one  sense 
or  another  to  a  given  random  process.  For  example,  if  the  series  in  (11.25)  is  truncated  to  terms 
with  1  k|  ^  N,  if  f  denotes  a  weakly  stationary  process  on  R  whose  spectral  measure  is  supported 
by  the  interval  [-r  +  7,  r-  7]  for  some  7  >  0  (the  ‘guard  band’  assumption),  then  for  |t|  ^  tt/It 
the  mean  square  error  e^sjft)  in  approximating  the  random  variable  f(t)  by  the  truncated  series 
obeys  an  inequality  of  the  form  e^ft)  ^  c(t)/N,  with  lim  c(t)  =  0  as  t— 0  [38]. 

Mean  square  convergence  can  in  fact  be  established  for  sampling  expansions  of  a  large  class 
of  nonstationary  processes  on  R  which  are  bandlimited  in  a  suitable  sense.  See  for  example  [35] 
again,  and  its  references,  for  a  precise  statement.  A  formula  similar  to  (11.26)  is  obtained  (with 
H  =  ot  Z,  as  usual),  and  shown  to  converge  almost  sprely,  as  well.  For  these  results  the  key 
technical  requirement  of  the  process  is  that  the  (two-dimensional)  Fourier  transform  of  the 
covariance  function  of  the  process  exist  as  a  distribution  in  a  r.;rtain  Sobolev  space  on  R^,  and 
that  its  support  be  contained  in  an  open  set  whose  translates  by  the  points  |ka'>(L  l);k  e  z|  are 
pairwise  disjoint. 

Returning  once  more  to  the  general  group  context  of  the  third  paragraph  above,  we  remark 
that  it  is  possible  to  establish  a  sampling  expansion  for  (strongly)  harmonizable  processes  defined 
on  G,  given  certain  assumptions  on  the  process  and  the  subgroup  H  along  which  the  process  is 
sampled.  Namely,  it  is  supposed  that  H  is  an  infinite  closed  discrete  cyclic  subgroup  of  G,  and 
that  the  spectral  measure  of  the  process  jxgigeG}  [as  defined  below  Equ.ation  (11.17)]  has  its 
support  contained  in  an  open  subset  of  f  X  f  whose  translates  by  the  points  (a,  a),  for  a  in  the 
annihilator  of  H,  are  pairwise  disjoint.  The  generalized  sampling  formula  then  appears  as 

Xg  =  lim  21  Sn(h)  Cg(h)Xh,  geG 
h«H 

where  the  Cg  are  numerical  coefficients,  and  the  are  uniformly  bounded  functions  of  finite 
support  on  H  that  converge  to  the  characteristic  function  of  the  identity  element  of  H  [35].  These 
arise  from  an  extension  to  the  group  setting  of  a  classical  summation  method  for  Fourier  series. 

It  would  appear  that  some  work  remains  to  be  done  in  this  area  before  the  exact  conditions  for 
a  valid  sampling  formula  are  obtained,  and  the  variety  of  relevant  summability  methods  is 
specified. 
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Ill.  DATA  PROCESSING  OVER  FINITE  GROUPS 


We  now  want  to  return  to  the  context  of  Section  II.  1,  which  should  be  reread  at  this  time, 
and  work  through  in  greater  detail  some  of  the  mathematical  aspects  of  linear  data  processing,  as 
defined  there.  Hence,  throughout  this  chapter  we  will  model  the  observations  as  a  random 
element  in  (equivalently,  a  probability  measure  on)  a  finite  dimensional  Hilbert  space.  Our 
emphasis  will  be  on  the  choice  of  ‘good’  orthonormal  coordinate  systems  to  facilitate  the  data 
processing  task  at  hand.  Any  such  choice  leads  immediately  to  a  corresponding  unitary  operator 
by  which  to  transform  the  data.  Particular  attention  will  be  paid,  as  already  promised  in 
Section  II.  1,  to  those  unitary  operators  which  are  group  (Fourier)  transforms  wrt  some 
underlying  group  structure.  The  interesting  trade-offs  here  concern  the  choice  of  group,  whose 
structure  then  determines  the  transform  complexity  and  hence  computational  efficiency,  the 
nature  of  the  signal  and  noise  statistics,  the  estimation  errors  or  distortion,  and  the  amount  of 
data  compression. 

III.l  UNITARY  OPERATORS  FOR  DATA  PROCESSING 

Following  the  preceding  introduction  and  the  background  of  Section  II.  1  we  now  consider 
data  in  the  form  of  an  element  y  belonging  to  a  finite  dimensional  Hilbert  space  Y.  The  precise 
nature  of  y  and  Y  is  not  too  important  initially,  but  typically  it  will  be  the  case  that  y  is  a 
column  vector  (yi,  .  .  .  ,  yN)‘  so  that  Y  is  the  space  of  all  complex  N-tuples  with  the  usual 
algebraic  operations  and  inner  product 

N 

<u,  v>  UjVj 

j=l 

Alternatively,  y  might  model  an  image  and  so  would  appear  after  preprocessing  as  a  matrix  [y^]; 
then  Y  would  be  the  space  of  all  such  matrices  of  the  same  dimension  with  the  weak  (Hilben- 
Schmidt)  norm  derived  from  the  inner  product 

<u,  v>  =  tr(uv*) 

Of  course,  such  data  could  be  rearranged  (‘stacked’)  to  form  a  column  vector. 

Now  before  proceeding  to  the  analysis  we  have  to  consider  what  we  might  want  to  do  with, 
or  learn  from,  this  data.  From  the  several  possible  generic  goals  listed  in  Section  1.1  we  will 
consider  here  only  three: 

—  Dimensionality  reduction 

—  Transform  coding 

—  Signal  estimation. 

The  first  of  these,  often  termed  feature  selection,  consists  of  replacing  y  by  its  projection  on 
a  subspace  of  Y.  We  thus  represent  the  data  by  fewer  parameters,  often  with  the  intent  of 
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submitting  this  reduced  data  to  a  pattern  classifier.  The  basic  issue  is  then  to  choose  the  optimal 
subspace,  having  fixed  its  dimension  and  an  error  criterion.  The  latter  will  depend  on  the  prior 
information  available  concerning  the  data  generating  mechanism;  for  instance,  this  may  be  a 
known,  or  estimated,  probability  distribution. 

Now  any  projection  on  Y  of  rank  d  is  unitarily  equivalent  in  many  ways  to  the  projection  of 

onto  which  simply  drops  the  last  N-d  components.  So,  as  we  apply  various  unitary 
transforms  U  to  the  data,  U:Y— C^,  the  resulting  first  d  components  constitute  the  possible 
d-dimensional  data  reductions.  Although  these  will  generally  be  suboptimal  wrt  any  given  error 
criterion,  some  of  them  may  be  more  efficiently  computed  than  the  optimal  transform.  This  will 
be  the  case  if  U  has  a  fast  algorithm,  that  is,  one  where  the  computational  effort  in  obtaining  Uy 
is  less  than  the  expected  OfN^)  floating  point  operations  (‘flops’).  Such  a  U  is  genetically  called  a 
fast  unitary  transform  (‘PUT);  the  FFT  is  the  most  famous  example.  The  point  here  is  that  if  the 
error  made  in  choosing  the  leading  d  components  of  Uy,  where  U  is  a  PUT,  does  not  greatly 
exceed  the  minimum  possible  error,  the  computational  savings  may  offset  the  slightly  higher 
error.  Then,  larger  size  data  blocks  at  a  higher  sampling  rate  could  be  processed,  resulting  in  an 
increase  in  overall  system  performance. 

Use  of  Fourier  or  otl  er  unitary  transforms  as  preprocessing  for  pattern  classification  dates 
back  to  the  1969-71  time  period;  the  sources  [48,  49]  may  be  consulted  along  with  their  many 
references. 

A  similar  situation  occurs  in  transform  coding,  one  of  the  principal  methods  of  data 
compression  [1,2],  This  is  a  collection  of  techniques  aimed  at  reducing  the  amount  of  signal 
space  necessary  for  a  given  signal,  where  the  components  of  signal  space  are,  generally,  physical 
space,  time,  and  bandwidth.  Data  compression  is  widely  employed  in  the  fields  of  speech  coding, 
telemetry,  television,  facsimile,  and  data  base  access.  In  general,  the  problem  is  the  efficient 
transmission  of  the  information  contained  in  a  multidimensional  source  signal,  and  the  idea  is  to 
eliminate  the  redundancy  in  the  signal  prior  to  encoding. 

A  schematic  of  transform  coding  is  given  in  Figure  Ill-l.  The  choice  of  the  first  transform  is 
driven  by  the  requirements  of  preserving  information  while  returning  uncorrelated  components. 
This  prewhitened  data  may  then  be  individually  quantized  and  encoded.  After  transmission 
through  the  channel  and  subsequent  decoding,  a  final  transform  is  applied  to  restore,  as  far  as 
possible,  the  original  signal. 

The  major  goal  in  the  transform  coding  technique  of  data  compression  is  to  be  able  to 
employ  fewer  bits  in  quantization  than  would  be  the  case  if  transforms  were  not  first  applied,  or 
if  the  data  were  treated  separately  rather  than  in  blocks.  If  the  number  of  available  bits  is  fixed 
then  tliey  should  be  allocated  so  as  to  minimize  some  measure  of  overall  distortion.  In  fact,  the 
selection  and  efficient  quantization  of  the  transformed  components  for  storage  or  transmission  is 
at  least  as  important  in  terms  of  overall  system  performances  as  the  choice  of  blocksize  and 
transform.  However,  in  this  report  the  latter  is  of  primary  interest.  Let  us  just  note  here  that 
generally  an  optimal  choice  of  bits  per  source  symbol,  given  a  certain  channel  capacity,  depends 
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Figure  lll-l.  Transform  coding  schematic. 
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on  the  variance  of  the  components  of  the  transformed  data  vector.  If  this  variance  fails  to  exceed 
a  threshold  the  component  is  dropped  (that  is,  0  bits  are  assigned);  otherwise  the  number  of  bits 
assigned  depends  on  the  variance  and  a  distortion  function. 

Thv'  topic  of  signal  estimation  was  already  introducted  in  Section  11. 1,  where  it  was 
concluded  ihat  for  zero-mean  signal  and  noise  the  optimal  linear  data  processor  is  the  Wiener 
filter  of  E((Uation  (II. 3).  In  greater  detail,  conserving  the  notation  y  =  s  +  tj  of  Equation  (II.  1),  we 
note  that  if  an  estimate 

s  =  L-y  (III.I) 
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of  the  signal  s  is  formed  by  an  arbitrary  linear  operator  L  acting  on  the  data  y,  then  the  mean 
squared  error  eL  is  given  by 

eL  =  E(||s-s||2) 

=  tr(Cs4) 

=  tr(Cs  -  CsyL*  -  LC*sy  +  LCyL*) 

=  tr(LCyL*)  -  2  re  tr(LC*sy)  +  tr(C^) 

=  <  LCy,  L>  -  2  re  <L,  C,y>  +  <€*,  I> 

where  <A,  B>  =  tr(AB*>  is  the  Hilbert-Schmidt  inner  product  on  the  space  of  operators  on  Y., 

In  this  form  the  error  cl  can  readily  be  minimized  wrt  L;  the  optimal  operator  L  is  the  Wiener 
operator  W  as  defined  in  Equations  (II. 3).  Further,  the  minimum  error  is  given  by 

ew  =  tr  (WCyW*  +  Cs)  -  2  re  tr  (WC*sy) 

=  tr  (CsyC-^C^y  +  Cs)  -  2  re  tr  (CsyCyClC^y) 

=  tr[Cs  -Cs(C,  +  C,)-'C,]  ,  (III.2) 

where  we  have  used  Cjy  =  Q  and  Cy  =  Cj  +  C,j  for  uncorrelated  signal  s  and  noise  tj.  We  note 
that  these  arguments  could  be  generalized  to  an  infinite  dimensional  setting  provided  that  rj  is 
interpreted  as  a  second-order  weak  random  variable 

Since  the  signal-to-noise  estimation  problem  is  so  fundamental,  we  make  a  brief  excursion  at 
this  point.  Consider  the  case  where  s  =  A(x),  that  is,  we  have  a  linear  inverse  problem  as 
discussed  following  Equation  (II.4).  If  x  belongs  to  a  Hilbert  space  X  so  that  A:X— Y,  and 
L:Y— X  is  any  potential  solution  operator,  then  the  mean  squared  error  Cl  (averaged  over  the 
noise  distribution)  can  be  expressed  as 

eL=||x-LAx||2  +  tr(LC,L*)  ,  (III.3) 

a  formula  which  remains  valid  in  the  infinite  dimensional  case  provided  that  L  is  restricted  to  be 
a  Hilbert-Schmidt  operator.  Now,  the  point  is  that  if  no  prior  information  is  available  concerning 
X,  there  is  no  way  to  choose  a  single  operator  L  to  make  eL  uniformly  small  for  every  xeX.  In 
order  to  obtain  a  unique  solution,  therefore,  some  additional  constraint  must  be  imposed.  A 
classical  restriction  is  to  make  the  estimate  x  =  L(y)  unbiased,  so  that  LA  =  I^.  Then  it  results 
that 

Lo  =  (A*C->  A)*l  A*  C-'  (III.4) 

gives  the  minimum  mean  squared  error.  This  operator  is  sometimes  known  as  the  Gauss-Markov 
estimator.  Technical  requirements  for  its  existence  are  that  be  invertible  (obviously!),  and  that 
A  have  a  trivial  nullspace. 

Now,  as  research  on  James-Stein  and  ridge  estimators  has  shown  [3,  Chapter  II],  a 
willingness  to  accept  some  bias  can  yield  a  smaller  mean  square  error.  Suppose  then  that  we 
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drop  the  unbiased  restriction  and  instead  consider,  as  we  have  been  doing  before,  a  pr.or 
distribution  on  x.  Tnen  the  optimal  (Wiener)  estimator  has  the  form 

W  =  C,A*  (AC,A*  +  C^)-l 

=  (C-‘  +  A*C-‘A)->  A*C-'  .  (III.5) 

The  first  formula  above  remains  valid  in  the  infinite  dimensional  case  provided  that  ♦he 
covariance  operator  of  the  prior  is  nuclear  (or,  trace  class);  otherwise,  we  restrict  to  the  finite 
dimensional  case  and  assume,  for  the  second  formula,  that  this  covariance  is  positive  definite, 
hence  invertible.  We  can  note  from  this  second  formula  that  as  prior  knowledge  of  x  becomes 
more  diffuse  in  the  sense  that  ||Cxl|-*®°,  the  Wiener  operator  W  converges  to  the  Gauss-Marko' 
operator  Lg  defined  by  Equation  (III.4), 

These  remarks  aside,  let  us  now  return  to  the  earlier  case  (dim  ¥<<*,  A  =  I)  with  Wiener 
filter  W  =  €5(05  +  C^)*‘  and  error  e^  given  by  Equation  (III. 2).  We  note  that  in  general  this 
operator  has  no  particular  structure,  except  in  the  important  but  special  case  where  the  noise  rj  is 
whhe,  so  that  is  a  scalar  matrix.  In  that  case  W  is  a  positive  (semi-)  definite  operator. 

Another  sufficient  condition  for  W  to  be  hermitian  is  that  commute  with  C,^.  Failing  this,  we 
fall  back  on  the  theorem  that  the  product  of  hermitian  operators  is  hermitian  if  and  only  if  it  is 
normal.  From  this  we  conclude  that  W  is  normal  <  =  =  =  >  W  is  hermitian  <  =  =  =  >  is 

hermitian,  the  last  provided  that  the  signal  covariance  is  nonsingular.  In  any  event,  it  is  not 
particularly  easy  to  compute  with  W,  and  this  difficulty  leads  to  the  concept  of  generalized 
Wiener  filtering. 

As  in  the  preceding  cases  of  data  reduction  and  coding  we  consider  a  preliminary 
transformation  of  the  data  y  by  a  unitary  operator  U:Y—CN.  We  then  multiply  this  transformed 
data  by  a  matrix  A  and  inverse  transform.  That  is,  in  the  notation  of  Equation  (III.l),  our 
estimates  have  the  form 

s  =  L  •  y  =  U*AU  •  y  (III.6) 

Any  such  transformation,  for  U  I,  is  called  a  generalized  Wiener  filter  [4].  What  is  interesting 
here  is  that  if  we  work  through  the  minimization  of  the  error  eL  =  E(||s  -§  i|2)  again,  we  find 
that,  for  fixed  U,  the  optimal  choice  of  A  is  UWU*,  where  W  is  the  original  Wiener  filter,  and 
that  the  minimum  value  of  eL  is  e^,  as  given  in  Equation  (III. 2).  Thus  the  minimum  error  turns 
out  to  be  independent  of  the  choice  of  transform  U.  In  particular,  we  are  free  to  choose  U  so 
that  the  optimal  A  has  a  simple  form. 

Since  we  must  make  some  error  no  matter  what  we  do,  the  real  issue  is  how  to  coordinate 
the  choice  of  U  with  some  suboptimal  but  simple  form  o'  the  matrix  A.  For  example,  a  natural 
first  question  is  whether,  for  some  U,  the  associated  optimal  A  is  diagonal.  Since  A  =  UWU*,  an 
equivalent  question  is  whether  the  Wiener  filter  W  is  normal.  As  was  observed  earlier  it  certai.ily 
is,  provided  that  the  noise  covarian  ,e  is  scalar  or,  more  generally,  commutes  with  the  signal 
covariance.  When  W  is  normal  by  virtue  of  the  noise  77  being  white,  any  unitary  operator  that 
diagonalizes  it  is  at  the  same  time  one  diagonalizing  the  signal  covariance  Cj;  these  are  called 
(discrete)  Karhounen-Loe  v'e  transforms  (DKLTs)  of  the  signal. 
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Let  us  now  summarize  the  pros  and  cons  of  the  use  of  the  DKLT  for  signal  estimation; 
unfortunately,  there  are  more  of  the  latter.  First,  it  need  not  always  exist  —  we  have  to  assume 
commutativity  of  noise  and  signal  covariance  operators.  Secondly,  it  changes  with  any  change  in 
the  signal  statistics.  Third,  there  is  no  reason  to  expect  it  to  be  a  FUT,  in  general,  so  that  as  the 
data  blocksize  increases  we  have  an  increasingly  lengthy  eigenvector  computation  to  accomplish. 
For  these  reasons  the  role  of  the  DKLT  in  data  processing  is  more  that  of  a  benchmark  rather 
than  a  viable  numerical  procedure. 

This  being  said,  we  can  now  suggest  the  main  concepts  of  suboptional  Wiener  filtering.  We. 
consider  the  generalized  Wiener  filters  of  Equation  (IIL6),  with  U  defined  independently  of  the 
signal  statistics  and  A  chosen  to  be  ‘simple’.  This  last  term  is  deliberately  a  little  vague;  we  have 
in  mind  that  A  should  be  diagonal  or  at  least  close  to  diagonal  terms  that  are  nonzero.  Further, 
U  should  be  an  FUT  so  that  the  computational  effort  is  reduced.  The  essential  trade-off,  then,  is 
between  filter  complexity  and  error,  for  different  statistical  signal  environments. 

With  this  extended  motivation  for  the  use  of  unitary  transforms,  particularly  FUTs,  for 
several  generic  data  processing  purposes  behind  us,  let  us  think  a  little  about  such  operators  from 
a  general  point  of  view:  what  they  do,  or  ought  to  do,  to  be  useful,  and  how  they  can  be 
constructed.  First  of  all,  we  recall  that  unitary  transforms  are,  in  effect,  changes  of  basis.  That  is, 
if  U:Y-*C^'  is  unitary  and  we  write 


then  each  Oj  depends  linearly  on  x,  so  that  there  is  a  vector  UjcY  with 

a!i  =  <Xi,  Ui> 

Since,  by  assumption, 

l|x||2=||U(x)||2=^|ai|2  ,  (IIL7) 

it  easily  follows  that  {uj, . . . ,  Un}  is  an  orthonormal  basis  (for  short,  a  frame)  in  Y.  So,  the 
effect  of  U  is  to  pick  out  the  coordi.nates  of  an  element  of  Y  wrt  a  particular  frame.  We  will  call 
these  coordinates  the  spectral  coordinates  of  /  wrt  U], . . .  ,un. 

When  x  is  a  data  vector  obeying  some  zero-mean  probability  law  on  Y,  its  spectral 
coordinates  wrt  the  frame  {Uj}  are  random  variables  on  Y.  In  this  case  the  numbers 

7i  =  E(|<x,  Uj>|2)  ,  i=l,  ...,N  (IIL8) 

constitute  the  power  spectrum  of  x  wrt  the  given  frame.  Although  the  relative  size  of  'he  y■^  will 
vary  from  one  frame  to  another,  we  have  the  identity 

N 

V7i  =  E(||x||2)  ,  (III.9) 

i=l 
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independently  of  frame.  Note  that  the  numbers  7;  can  also  be  interpreted  as  the  values 
<  C^Uj,  Uj  >,  where  is  the  covariance  operatoi  of  the  random  variable  x.  If,  in  particular,  the 
frame  )  U;  |  is  chosen  so  as  to  diagonalize  C^,  then  7;  is  just  the  spectrum  of  this  operator,  and 
the  number  defined  in  Equation  (III.9)  is  seen  to  be  tifC^).  This  particular  set  7}  is  sometimes 
called  the  normal  power  spectrum,  and  a  corresponding  frame  uj  of  eigenvectors  is  a 
Karhounen-Loeve  basis  for  Y  (relative  to  the  law  of  x). 

At  this  point  we  know  that  unitary  transforms  preserve  the  energy  of  observed  data  vectors 
[this  is  the  import  of  Equation  (III.7),  and  also  the  total  signal  power  [the  value  in 
Equation  (III. 9)].  This  latter  number  is  sometimes  referred  to  also  as  the  (statistical)  bandwidth 
of  the  data.  We  may  also  note,  without  proof,  that  unitary  transforms  preserve  the  relative,  or 
cross-entropy  between  a  pair  of  random  vectors  in  Y.  That  is,  if  x,  y  are  random  vectors  in  Y 
with  distributions  p,  q,  respectively,  whose  cross-entropy  H(p,  q)  as  defined  in  Equation  (1.6)  is 
finite,  then 

H(x,  y)  =  H(p,  q)  =  H(Ax,  Ay) 

where  A  is  any  unitary  or,  more  generally,  nonsingular  linear  operator  on  Y.  So,  in  this  precise 
sense,  unitary  transforms  preserve  the  information  about  one  random  vector  contained  in 
another.  This  sort  of  result  goes  back  to  the  early  work  of  Shannon,  Kolmogorov,  Gelfand  and 
Yaglorn  in  information  theory  [5];  see  also  the  more  recent  and  more  detailed  work  of 
Rosenblatt-Roth  [6]. 

We  are  still  left  with  the  task  of  selecting  a  unitary  transform  to  fit  a  particular  data 
processing  task,  or,  equivalently,  a  suitable  set  of  spectral  coordinates.  We  have  seen  that  we 
can’t  distinguish  between  such  transforms  on  the  basis  of  power  or  information-theoretic  criteria. 
What  we  can  expect,  however,  is  to  differentiate  on  the  basis  of  the  statistical  behavior  of  the 
individual  spectral  coordinates.  Specifically,  desirability  of  unitary  transforms  increases  with  their 
ability  to  decorrelate  these  coordinates  and  to  pack  most  of  the  signal  power  into  a  small  number 
of  them.  This  latter  phrase  means  roughly  that  for  M  much  less  than  N  the  sum  7|  +  .  •  .  +  7ivi 
should  be  near  to  the  total  signal  power  of  Equation  (III. 9). 

This  last  goal  is  an  example  of  a  data  processing  task,  the  performance  of  which  can  be 
measured  by  a  function  F  of  the  power  spectrum  coordinates,  namely, 

M 

F  (71  .  •  ■  .  .  7n)=  2  Tj 

J=l 

The  goal  is  achieved  by  a  unitary  operator  U  for  which  F  is  maximized.  Now  it  is  fairly  direct  to 
show  that  if  [uj|  is  a  Karhounen-Loeve  basis  for  Y  and  |vi}  is  any  other  frame,  then  its  power 
spectrum  A  =  (A], .  .  .  ,  A^)  is  related  to  the  normal  power  spectrum  7  =  (7i, .  ,  7n)  associated 

with  Uj  by 

A  =  A7 
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where  A  is  the  orthostochastic  matrix  [|<Ui,  Vj>p].  By  Birkhoff's  theorem,  A  is  a  convex 
combination  of  permutation  matrices: 

A=S“kPk 

and  so 

A  =  S  “k  (PkZ) 

That  is,  all  possible  power  spectra  of  a  given  second  order  probability  measure  lie  in  the  convex 
hull  of  the  permutations  of  the  normal  power  spectrum.  The  import  of  this  observation,  which 
goes  back  to  J.  Pearl  [7j,  is  that  the  merit  function  F  is  maximized  by  a  DKLT  of  the  data.  Of 
course,  this  nice  theoretical  result  ignores  computational  realities. 

A  second  example  of  such  merit  functions  occurs  when  the  processing  goal  is  efficient 
transform  coding;  for  example,  the  spectral  coordinates  are  to  be  transmitted  through  a  binary 
channel  of  some  fixed  capacity  C  bits/ symbol.  In  terms  of  a  given  distortion  function  0, 
decreasing  and  convex,  and  an  assignment  of  Bj  bits  to  the  jth  coordinate  (the  latter  being 
assumed  independent  here),  the  merit  function  F  becomes 

min  jS  </>  (Bj)  7j:  Bj  ^  0,  2  Bj  =  C  [  . 

From  this  it  follows  that  F  is,  in  fact,  a  certain  linear  function  of  (71, .... ,  7j,j)  [8]. 

Another  kind  of  merit  function,  not  of  the  above  form,  is  needed  when  trying  to  rank 
unitary  transforms  by  their  ability  to  decorrelate  the  spectral  coordinates  of  a  probability  measure 
or,  perhaps,  a  ‘small’  class  of  such.  If,  as  usual,  we  deal  with  a  covariance  matrix  and  a 
unitary  transform  U,  the  spectral  coordinates  obey  a  probability  law  whose  covariance  matrix  is 
A  =  UC^U*.  Its  diagonal  entries  7ii  are  the  components  of  the  power  spectrum.  Hence  an 
appropriate  figure-of-merit  would  be  some  measure  of  the  magnitude  of  the  off-diagonal  entries 
7ij;  for  instance 

F(A)=^SlTijP  (in.  10) 

i9^j 

This  last  data  processing  task,  namely,  to  choose  a  unitary  transform  to  approximate  a 
DKLT  of  a  particular  covariance  (or  class  thereof),  has  led,  over  the  past  decade,  to  some 
interesting  work  [9,  10]  on  asymptotic  properties  of  various  spectral  representations,  which  we 
summarize  briefly  here.  The  general  idea,  as  in  so  much  of  statistical  theory,  is  to  study  the 
behavior  of  certain  approximations  to  ‘truth’  as  the  sample  size  (blocklength,  here)  becomes 
infinite. 

Specifically,  suppose  given  a  sequence  {Un:N  =  1,  2, . . .}  of  N-dimensional  unitary 
transforms,  and,  for  each  N,  a  family  of  positive-semidefinite  N-dimensional  matrices.  Each 
class  Cl^(  is  intended  to  consist  of  possible  covariances  of  observed  data.  Also,  let  Ffsj  be  a  non¬ 
negative  function  defined  on  the  space  of  all  N  X  N  matrices  such  that  FjjfA)  =  0  if  A  is  diagonal. 
Fn  is  intended  to  measure  how  far  a  matrix  is  from  being  diagonal;  it  could  be  defined,  for 
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instance,  by  Equation  (III.  10).  We  can  then  agree  to  say  that  the  transform  sequence 
asymptotically  decorrelates  the  family  if,  for  each  sequence  Cn  ,  where  Cisj  6  Cj^, 

lim  Fn(UnCnU*n)  =  0  ..  (III.  11) 

N  ~*®® 

Thus,  if  Fjsj  is  defined  by  Equation  (III.  10)  (hereafter  referred  to  as  the  ‘standard  case’),  then  a 
sufficient  condition  for  (III.  11)  to  hold  is  that  each  off-diagonal  entry  7ij  of  Ujs}CjvjU*n  satisfy 
|7ijp  =  o(l/N),  as  N— 00. 

The  underlying  motivation  for  the  foregoing  abstract  set-up  is  the  desire  to  rapidly  process 
large  blocks  of  data  by  first  applying  an  FUT  to  the  data  vector  and  then  treating  the  resulting 
spectral  coordinates  independently  for  coding/ compression  purposes.  Since  the  exact  data 
statistics  are  rarely  known,  we  must  expect  the  processing  to  be  effective  over  a  class  Cy  of 
possible  covariance  matrices.  The  prototypical  example  is  that  where  the  data  is  obtained  as  a 
segment  of  a  discrete  weakly  stationary  process,  so  that  consists  of  Toeplitz  matrices,  and 
{U^}  is  the  sequence  of  N-dimensional  DFTs.  Then  it  is  known  [11]  that  if  the  process  is 
restricted  to  have  square-summable  covariance  sequence,  the  DFT  sequence  will  asymptotically 
decorrelate  all  the  corresponding  (Toeplitz)  covariance  matrices. 

In  general,  if  C  e  and  UnCUjv*  =  D  (a  diagonal  matrix)  in  the  sense  that  Fn(UnCUn*) 
is  small,  and  we  set  C'=  U*j,{DUn,  then  C  is  diagonal  wit  the  frame  associated  with  and 
C'  C,  at  least  in  the  standard  case,  by  unitary  equivalence.  So,  asymptotical  decorrelation  of 
the  family  {Cn|  by  the  transform  sequence  {Unj  is  equivalent  to  even  better  approximation  of 
the  matrices  in  the  frame.  When  Un  is  the  N-dimensional  DFT,  the  corresponding  class  of 
diagonable  matrices  consists  of  circulants. 

So  far  wc  have  not  mentioned  the  rate  of  convergence  in  Equation  (III.  11).  At  least  in  the 
standard  case  this  rate  is  interesting  for  two  reasons.  First,  for  a  given  family  {Cjm}  of 
covariances,  convergence  rates  allow  us  to  compare  t»»e,  performance  of  different  transform 
sequences.  This  might  permit  us  to  decide,  for  instance,  »''hether  certain  data  might  best  be 
processed  with  Fourier,  Walsh,  Haar,  cosine,  or  yet  other  transforms.  Second,  by  assigning 
numerical  performance  criteria  for  various  data  processing  tasks,  we  might  be  able  to  bound  or 
estimate  the  performance  degradation  resulting  from  the  use  of  these  various  fast  but  suboptimal 
approximations  to  the  DKLT. 

As  an  illustration,  suppose  that  is  the  class  of  covariance  matrices  corresponding  to  first 
order  Markov  processes.  That  is,  C  e  means 

C  =  [pl'-Jl],  (111.12) 

for  1  s;  i,j  ^  N  and  0  <  p  <  1;  p  originates  as  the  correlation  coefficient  between  adjacent  samples. 
Here  the  spectrum  and  Karhounen-Loeve  basis  can  be  explicitly  obtained  [4].  Further,  it  can  be 
shown  that,  not  only  does  the  DFT  sequence  asymptotically  decorrelate  these  covariances,  but  so 
does  the  popular  discrete  cosine  transform  (DCT)  [13],  and  in  fact  it  is  ‘better’  than  the  DFT  in 
this  context.  That  is,  while  the  rate  of  convergence  in  Equation  (111.11)  is  the  same  for  both 
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transform  sequences,  the  DCT  error  is  strictly  less  than  the  DFT  error,  for  every  choice  of  p 
[12].  A  related  remark  here  is  that  the  DCT  is  now  known  to  be  derived  from  the  limiting  form 
(as  p— 1)  of  the  Karhounen-Loeve  basis  for  C  in  Equation  (III.  12)  [14]. 


At  present,  the  DFT,  the  DCT,  and  several  other  kinds  of  unitary  transforms  have  been 
embedded  into  some  general  theories.  We  have  in  mind  here  on  the  one  hand  the  Gauss-Jacobi 
transforms  of  Yemini  and  Pearl  [10],  which  are  based  on  the  classical  convergence  of  Gaussian 
quadratures  derived  from  orthogonal  polynomials,  and,  on  the  other  hand,  the  sinusoidal 
transforms  of  Jain  [15].  These  latter  are  the  eigenvector  frames  of  a  parameterized  family  of 
Jacobi-like  matrices  of  the  form 


J  (k,,  kj,  kj)- 


1  -  kjO  -a 


-a  1 


k3a 


k3a 


1  -a 

-a  1  -  k20( 


These  sources  should  be  consulted  for  pertinent  details.  We  are  now  going  to  turn  our  attention 
in  a  different  direction,  to  a  consideration  of  unitary  transforms  arising  from  group-theoretic 
setting. 


50 


III.2  GROUP  ALGEBRAS  AND  REPRESENTATIONS 


Having  attempted  to  motivate  the  use  of  unitary  operators  in  various  data  processing 
algorithms,  we  now  specialize  to  the  case  where  these  operators  are  the  group  transforms  of 
various  finite  groups.  In  this  section  we  will  quickly  review  the  relevant  group  theory  background 
and  then  look  at  the  structure  of  group  transforms  in  the  next  section. 

Recall  that  we  are  dealing  with  data  presented  as  random  element  of  an  N-dimensional 
Hilbert  space  Y.  In  the  previous  section  we  pointed  out  the  connection  between  unitary  operators 
on  Y  and  frames  in  Y;  in  particular,  a  unitary  map  from  Y  onto  gives  the  (spectral) 
coordinate  vector  associated  with  a  particular  frame  B  in  Y.  This  association  effectively  realizes  Y 
as  the  discrete  sequence  space  2^(8).  Now  the  key  idea  underlying  the  rest  of  this  report  is  the 
frame  B  may  have  some  additional  structure  —  for  instance,  it  might  be  a  group. 

Here  are  two  simple  examples  when  N  =  4.  First  of  all,  there  are  only  two  distinct 
(nonisomorphic)  groups  of  order  4:  the  cyclic  group  C4  and  the  Klein  4-group  D2.  Both  are 
abelian;  indeed,  any  group  of  order  p  or  p^,  p  a  prime,  is  necessarily  abelian.  Assume  that 
Y  =  €“*  and  consider  the  frame  B]  -  |u|,  U2,  U3,  U4I,  where 


1  " 

”  1 

"  1 

'  1 

1 

w 

w2 

w3 

,  U2  = 

,  U3  = 

.  U4  = 

1 

w2 

W^* 

w6 

1 

w3 

w6 

W9 

and  w  =  exp(27ri/  N)  =  i,  here.  Under  the  operation  of  componentwise  multliplication,  this  frame  is 
easily  seen  to  be  a  group  isomorphic  to  C4.  Similarly,  the  frame  82=  V|,  V2,  V3,  V4  defined  by 


1  - 

r 

1  ' 

r 

v,  = 

1 

1 

,  V2  = 

-1 

1 

II 
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1 

-1 

If 

> 

-1 

-1 

1 

-1 

-1 
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is  again  a  group  under  componentwise  multiplication,  this  time  isomorphic  to  D2. 

Now,  while  these  ‘group-frames’  may  seem  more  or  less  natural,  we  can,  in  fact,  realize  any 
group  of  order  N  as  a  frame  in  Y.  Namely,  let  G  be  such  a  group  and  let  its  Haar  measure  mQ 
be  normalized  so  that  mofC)  =  1;  that  is,  mQ(g)=  1/N  for  each  geG.  Then  the  space  L^(G)  is 
N-dimensional  and  hence  congruent  with  Y.  It  contains  the  frame  |  \/Neg:gtG  |  where  eg  is  the 
indicator  function  of  |g|.  The  image  of  this  frame  in  Y  under  any  congruence  T  is  again  a 
frame,  and  it  is  claimed  that  this  frame  can  be  given  a  group  structure  under  which  it  is 
isomorphic  to  G.  This  claim  easily  follows  from  the  facts  that  the  space  L^fG)  is  an  algebra 
under  convolution  as  multiplication,  and  that  the  product  CgC^  =  eg.h,'for  g,  htG.  Then  the  group 
structure  on  the  frame  |  x/TTeg  |  is  carried  over  to  its  image  frame  in  Y  so  that  T  becomes  an 
isomorphism. 
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The  upshot  of  these  observations  is  that  it  is  possible  to  model  the  data  as  a  random 
element  of  rather  special  Hilbert  spaces  of  the  form  L^fG),  where  the  order  of  G  is  the 
blocklength  of  the  data.  As  was  noted  back  in  Section  1.2,  while  Hilbert  spaces  of  the  same 
dimension  are  abstractly  equivalent,  they  individually  possess  widely  different  realizations  as  sets 
of  functions  or  operators.  In  the  present  case  spaces  L^fG)  have  a  very  rich  structure,  going  far 
beyond  that  permitted  by  the  usual  Hilbert  space  axioms.  This  structure  can  be  revealed  by 
several  different  approaches:  group  representations,  harmonic  analysis,  and  Banach  algebras,  to 
name  three.  These  theories  are,  of  course,  very  powerful  and  extensive,  and  servw  to  similarly 
describe  the  structure  of  L^fG)  for  general  compact  topological  groups  G,  and  many  others 
additionally.  In  this  section  we  will  Just  review  those  structural  aspects  that  seem  relevant  to  data 
processing  applications. 

First,  as  earlier  noted,  L^fG)  is  an  algebra  under  convolution  multiplication: 
fi*f2(g)=  T7  S  f|  (gh-')f2(h)  . 

^  h«G 

Thus,  since  G  is  finite,  L^fG)  is  set-theoretically  identical  with  the  so-called  ‘group  algebra’  of  G. 
When  G  is  not  of  finite  order,  that  term  is  more  commonly  applied  to  the  space  L*(G).  There  is 
also  an  involution  f— f*  defined  by 

f*(g)  =  fiF)  . 

These  operations  are  tied  together  with  the  inner  product 

<fl,  f2>  J  fi  f2  dmgS-i  ^  fl(g)  l^i) 

G 

by  the  formula 

<f,  *f2,  f3>  =  <f2,  fVf3>  •  111.13 

Thus  L-(G)  is  at  the  same  time  a  Banach  algebra  and  a  Hilbert  space.  Such  spaces,  with  property 
(111.13),  are  called  H*-algebras;  their  structure  has  been  described  by  Ambrose  [16]  and  recounted 
by  Loomis  [17].  The  basic  fact  is  that  such  a  space  is  uniquely  expressible  as  an  orthogonal 
direct  sum  of  its  minimal  (closed)  two-sided  ideals,  each  of  which  is  isomorphic  to  a  full  matrix 
algebra. 

Since  we  are  ultimately  interested  in  using  the  group  (Fourier)  transform,  it  is  more  natural 
here  to  introduce  (unitary)  representations  of  G  as  the  key  technical  tool  for  the  study  of  L2(G). 
The  representation  theory  of  finite  groups,  due  originally  to  Frobenius,  and  developed  by  Schur, 
Burnside,  Weyl,  and  many  others,  is  purely  algebraic  and  is  described  in  detail  in  many  sources; 
for  instance,  the  books  of  K,eown  [18]  and  Serre  [19].  It  has  also  been  generalized,  largely  intact, 
to  the  case  of  compact  groups,  where  analytic  techniques  predominate  and  harmonic  analysis 
(generalized  Fourier  series)  is  often  the  focus.  This  material,  originating  with  the  Peter- Weyl 
theorem  in  1927  (briefly  outlined  in  Section  1.2)  is  available  in,  for  example,  the  books  of 
Edwards  [20]  and  Naimark-Stern  [21],  and  in  the  Hewitt-Ross  treatise  [22]. 
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We  now  review  (rapidly)  just  those  aspects  of  representation  theory  necessary  to  reveal  the 
proper  setting  for  the  group  (Fourier)  transform  ana  the  structure  of  the  group  algebra.  In 
general,  a  representation  of  a  group  G  is  a  strongly  continuous  homomorphism  T  from  G  into 
the  group  of  invertible  operators  on  some  complex  topological  vector  space  V.  Since  our  major 
interest  is  in  the  case  where  G  is  finite,  the  continuity  of  T  is  trivial  and  without  essential  loss  we 
may  take  dim(V)<oo.  if  (.  ^ .)  is  an  arbitrarily  assigned  inner  product  on  V,  the  formula 

<u,  v>  =  J  (T(g)u,  T(g)v)  dmG(g) 

G 

defines  a  new  inner  product  on  V  in  which  the  T(g)  are  unitary  operators.  So  we  may  restrict 
attention  to  unitary  representations  of  G. 

A  representation  T  is  irreducible  if  there  is  no  nontrivial  subspace  of  V  that  is  invariant 
under  all  the  operators  T(g),  gtG.  An  easy  induction  shows  that  every  representation  is 
completely  reducible,  in  the  sense  of  being  a  direct  sum  of  irreducible  representations.  Thus  these 
latter  are  the  building  blocks  of  the  general  theory,  although  finding  a  complete  list  of  them  for  a 
given  group  may  be  very  difficult. 

Some  operator  criteria  for  irreducibility  of  unitary  representations  are  the  following;  since  a 
subspace  of  V  is  invariant  under  |T(g);  geGj  e.xactly  when  its  orthogonal  projection  commutes 
with  each  T(g),  it  follows  that  T  is  irreducible  if  and  only  if  the  commutant  of  |T(g)|  consists 
only  of  scalar  operators  (Schur’s  lemma).  Similarly,  one  can  show  that  the  algebra  span 
jT(g);geG}  is  the  space  L(V)  of  all  operators  on  V  exactly  when  T  is  irreducible  (Burnside’s 
theorem). 

Every  irreducible  representation  of  G  is  finite  dimensional.  (Proofs  of  this  fundamental  fact 
for  general  compact  groups  are  often  based  on  the  eigenstructure  of  a  compact  hermitian 
operator,  but  it  can  be  made  to  follow  only  from  Schur’s  lemma  [23].)  Of  these,  the  simplest 
examples  occur  when  dim  V  =  1.  In  this  case  we  effectively  are  looking  at  homomorphisms  of  G 
into  the  circle  group  T  and  we  have  already  referred  to  such  mappings  as  characters.  When  G  is 
abelian  it  turns  out  that  the  characters  form  a  group  f  under  natural  operations  and  that  G  is 
canonically  isomorphic  to  the  dual  of  F;  this  is  an  instance  of  the  Pontryagin  duality  theorem 
which  in  fact  remains  valid  for  general  lea  groups.  In  this  abelian  case  it  is  further  true  that  all 
irreducible  representations  are  one-dimensional,  hence  characters  of  G.  Finally,  it  can  be  verified 
that  the  characters  constitute  a  frame  in  the  group  algebra  L“(G);  hence  G  and  F  are  of  the  same 
order.  The  group  F  is  called  the  dual  group  of  G  and  one  of  the  tasks  of  the  nonabelian  theory 
is  to  find  a  suitable  substitute  for  it  that  continues  to  shed  light  on  the  structure  of  G  and  its 
group  algebra. 

Two  representations  T:G-'L(V)  and  S;G^L(W)  are  equivalent  if  there  is  an  isomorphism 
A;V— W  such  that  A'T(g)  =  S(g)-A,  geG.  By  this  notion,  inessentially  different  representations  are 
collected  together  in  equivalence  classes.  If,  as  we  assume,  V  and  W  are  finite  dimensional 
Hilbert  spaces  and  S,  T  are  equivalent  unitary  representations,  then  S  and  T  are  actually 
unitarily  equivalent;  the  proof  utilizes  the  polar  decomposition  ol  the  isomorphism  A. 
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One  way  to  distinguish  between  equivalence  classes  of  (unitary)  representations  is  by  an 
extended  notion  of  character.  Namely,  if  T:G--L(V)  is  any  representation  of  G,  define  its 
character  xj  by 

XT(g)  =  tr[T(g-')]  (III.  14) 

One  easily  checks  that 

XT(e)  =  dim(V),  XT(g'‘)  =  MS)  -  (III.  15) 

for  T  unitary,  and  that  characters  of  equivalent  representations  coincide.  Conversely,  by 
decomposing  a  given  representation  into  a  direct  sum  of  irreducible  representations,  one  can 
show  that  two  representations  with  the  same  character  are  equivalent.  Thus,  a  representation  is 
‘characterized’,  up  to  equivalence,  by  its  character. 

By  calculations  based  on  Schur’s  lemma  one  shows  that  the  characters  associated  with 
inequivalent  irreducible  representations  form  an  orthonormal  set  in  L2(G).  Actually,  the  norm  of 
the  character  associated  with  any  representation  is  always  at  least  one,  and  equals  one  exactly  in 
the  irreducible  case. 

Now  as  a  replacement  for  the  dual  group,  we  define  the  unitary  dual  object  F  to  be  the  set 
of  equivalence  classes  of  irreducible  unitary  representations  of  G.  From  the  foregoing  remarks  we 
could  equally  well  take  F  to  be  the  set  of  associated  characters  —  the  ‘irreducible  characters.’  By 
the  orthonormality  property  this  set,  denoted  |xi.  •  .  Xr{>  ^  frame  in  L^fG),  so  that 

r  ^  ord(G).  In  fact,  more  is  known:  if  we  denote  by  dj  the  dimension  of  the  space  of  any 
representation  associa  ted  with  Xii  then  we  have  the  Burnside  formula 

ord(G)  =  d^ +  ...,  + d2  .  (III.  16) 

Also,  each  integer  d;  divides  the  order  of  G  and  also  the  index  of  the  center  of  G  in  G.  This  last 
remark  is  usetul  provided  that  the  center  of  G  is  nontrivial;  this  is  the  case  for  instance,  if  G  is  a 
p-group,  that  is,  ord(G)  is  a  power  of  some  prime  p.  The  center  then  contains  at  least  p  elements. 

Moving  back  now  to  the  group  algebra  L“(G),  its  center  consists  of  the  so-called  class 
functions  f  defined  by  the  condition 

f(hgh-')  =  f(g),  g,  heG. 

These  are  just  the  functions  on  G  that  are  constant  on  each  of  the  conjugacy  classes  of  G, 
Examples  are  the  characters  of  any  representation  of  G.  It  turns  out  that  the  irreducible 
characters  of  G  span,  and  hence  constitute  a  frame  for,  this  space  f  class  functions.  Hence  the 
cardinality  r  of  G  is  also  the  number  of  conjugacy  classes  of  G.  The  rxr  matrix  whose  (i,j)  entry 
IS  the  value  ot  the  ith  irreducible  character  on  the  jth  class  of  G  is  often  called  the  character 
table  o(  G. 

The  orthonormality  of  the  irreducible  characters  in  L-(G)  serves  another  purpose.  It  was 
earlier  noted  that  any  representation  T  of  G  is  a  direct  sum  of  irreducibles.  Using  the  characters 
of  all  these  representations  we  can  write  an  explicit  formula  tor  this,  namely 
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(111.17) 


where  each  T;  is  a  representative  from  an  equivalence  class  in  T. 

Before  moving  on  to  the  structure  of  the  group  algebra  we  offer  a  few  comments  to  draw 
together  some  loose  ends.  We  have  noted  that  the  characters  of  one-dimensional  unitary 
representions  coincide  with  the  characters  introduced  previously  as  homomorphisms  of  G  into  the 
circle  group.  These  latter  may  be  called  group  characters  and  we  know  they  form  an  abelian 
group  r.  When  G  is  abelian  F  determines  G  by  duality.  Otherwise,  there  are  complications.  The 
commutator  subgroup  G'  of  G  is  nontrivial.  The  abelian  quotient  group  G/G'  has  the  same  dual 
group  as  G.  Hence  the  group  characters  only  help  us  understand  the  commutative  structure  of  G. 
When  G  is  nonabelian  there  is  at  least  one  irreducible  (unitary)  representation  of  G  on  a  space  of 
dimension  >1.  Further,  there  may  be  no  nontrivial  group  characters  at  all;  this  would  be  the 
case,  for  instance,  if  G  has  no  nontrivial  normal  subgroups  (G  is  then  said  to  be  simple).  In  any 
event,  when  G  is  not  abelian  the  dual  object,  also  denoted  F  above,  does  not  have  a  natural 
group  structure  and  does  not  generally  determine  G  [22,  p.  57].  A  successful  substitute  for  F  was 
introduced  by  Tannaka  in  1939  for  general  compact  groups  and  later  axiomatized  by  Krein 
(1949)  and  Kelley  (1963).  This  is  the  space  Tq  spanned  by  the  coordinate  or  representative 
functions  on  G,  that  is,  functions  of  the  form  f(g)  =  <T(g*')u,  v>,  as  T  runs  through  the  classes 
in  F  (also  called  trigonometric  polynomials).  A  certain  set  of  linear  functionals  on  Tq  turns  out 
to  admit  a  group  structure  under  which  it  is  compact  and  naturally  isomorphic  with  G  [22, 

Sec.  30]. 

As  a  brief  aside  we  remark  that  a  very  active  topic  of  research  in  the  period  1959-1974  was 
the  development  of  a  general  duality  theory  for  noncompact  and  nonabelian  locally  compact 
groups.  Contributions  to  this  area  were  made  by  W.  Stinespring,  P.  Eymard,  J.  Ernest,  K.  Saito, 
N.-  Tatsuuma,  M.  Takesaki,  C.  Akemann,  and  M.  Walter,  in  rough  ch'onological  order..  All  this 
work  makes  substantial  use  of  the  theory  of  operator  algebras  and  related  functional  analysis, 
and  attempts  to  characterize  a  given  locally  compact  group  G  in  terms  of  a  related  space  of 
functions  on  G  or  operators  on  L-(G).  Thus  the  thrust  here  is  in  a  different  direction  from  much 
of  the  earlier  work  on  locally  compact  groups  which  was  concerned  with  decomposition  of 
specific  unitary  representations,  generally  of  infinite  dimension,  of  such  a  group. 

The  structure  of  the  group  algebra  L-(G)  is  a  consequence  of  properties  of  the  set 

Xi . Xr  of  irreducible  characters  of  G.,  As  elements  of  L-(G)  these  characters  are  hermitian 

<Xi*  =  Xi)  ^nd  obey  the  orthogonality  relations 

Xi  *  Xj  ~  ^ij  d  [  Xi 

Hence  the  (two-sided)  principal  ideals  J,  generated  by  the  Xi  are  orthogonal  subspaces  of  L-(G); 
for  this,  the  formula 

<f,  h>  =  r*  h(o) 

is  helpful.  The  set  |j| . J^j  of  these  ideals  is  e.xactly  the  set  of  minimal  two-sided  ideals  in 

L-(G)  and  we  have  the  decomposition 
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(III.18) 


L^(G)  =  J,  +  ...  +  Jr 

with  corresponding  orthogonal  projection  P,;L-(G)— Jj  given  by 

P.(0  =  f*dix.  .  (111.19) 

This  ideal  can  also  be  defined  as  consisting  of  all  functions  on  G  of  the  form 

fA{g)  =  tr[A-T,(g)-')]  ,  (III.20) 

where  A  is  an  arbitrary  operator  on  the  space  V,  of  the  irreducible  representation  Tj.  If  Cn*')  is 
a  frame  in  V,  then  a  frame  for  the  ideal  J,  is  {finn^'l|,  where 

=  Vd;<Ti(g-')  •  e(;),e(0>.  (I1L21) 

The  correspondence  f^^A  defined  by  Equation  (III. 20)  above  sets  up  an  isomorphism 
between  J,  and  the  full  operator  algebra  L(V,),  i  =  1, . . .  ,  r.  An  essentially  inverse  isomorphism 
may  be  achieved  by  expanding  feJ,  in  the  frame  elements  fmn 

f=Tc(')fdi  , 

and  making  the  djxd,  matrix  [c*^!,]  correspond  to  f.  Under  the  first  correspondence,  the  central 
character  x,  in  Jj  maps  into  the  identity  of  UVj),  hence  x\  serves  as  the  identity  of  J;. 

All  these  results  generalize  more  or  less  directly  to  the  case  where  G  is  a  separable  compact 
group.  Expansion  of  an  arbitrary  feL^fG)  in  all  the  frame  bases  (111.21)  as  T,  runs  through  a 
(countable)  complete  set  of  irreducible  (finite  dimensional)  unitary  representations  of  G  yields  the 
group  Fourier  series  for  f.  Such  an  expansion  reduces  to  the  classical  Fourier  expansion  of  a 
periodic  function  when  G  is  the  circle  group. 

We’ll  close  this  section  with  a  comment  about  the  regular  representation(s)  of  a  group  G.  Its 
decomposition  into  irreducible  components  contains  the  essence  of  harmonic  analysis  on  groups, 
and  provides  much  motivation  for  the  foregoing  results.  The  right  regular  representation  of  G  on 
L‘(G)  is  defined  by 

R(go)  f(g)  =  f(ggo) 

There  is  also  a  left  regular  representation  which  is  unitarily  equivalent  to  R.  These  unitary 
representations  are  natural  extension  of  the  translation  group  jrgigtGj  discussed  in  Section  11.3 
for  lea  groups  G. 

A  subspace  M  of  L2(G)  is  invariant  if  R(g)M  C  M,  geG,  and  a  general  goal  of  harmonic 
analysis  is  to  decompose  L^tG)  into  a  direct  sum  of  such  subspaces.  In  this  context,  if  we  fix  an 
index  i,  1  ^  i^  r,  and  consider  functions  f^^  as  defined  by  Equation  (III. 20),  we  see  that 

R(g)  "  fT,(g)A  ’  g^G 

so  that  the  ideals  J,  in  the  decomposition  (111.19)  are  invariant. 

In  another  approach  to  harmonic  analysis  one  can  begin  with  the  representation  R  and 
attempt  to  decompose  it  into  a  sum  of  irreducible  representations  of  finite  dimension.  We  know 
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this  is  possible  for  groups  of  finite  order,  and  in  fact,  it  continues  to  be  true  for  all  compact 
groups.  But  in  general,  a  locally  compact  group  need  not  have  any  nontrivial  finite  dimensional 
unitary  representations,  irreducible  or  not  (certain  connected  semisimple  Lie  groups,  for  example). 
Actually,  in  our  elementary  situation  it  follows  from  Equation  (III.  17)  that  every  irreducible 
representation  occurs  in  the  regular  representation  with  a  multiplicity  equal  to  its  dimension.  This 
too  continues  valid  when  G  is  compact.  In  the  general  locally  compact  case,  whether  or  not  a 
particular  irreducible  unitary  representation  occurs  in  the  regular  representation  depends  on 
further  assumptions  about  the  group.  If,  for  example,  there  is  a  Plancherel  measure  on  F,  (as 
there  is  when  G  is  unimodular  and  type  I),  then  points  in  its  support  are  exactly  those  contained 
in  the  regular  representation. 

When  G  is  finite,  as  we  are  assuming  in  this  chapter,  there  is  a  natural  isomorphism  between 
the  group  algebra  and  the  commutant  of  the  regular  representation.  This  is  the  map  that  assigns 
to  each  feL^fG)  the  operator  of  convolution  with  f.  Now  the  communtant  of  any  finite 
dimensional  representation  is  a  direct  sum  of  full  matrix  algebras  (Schur’s  lemma  again),  so  in 
this  fashion  it  is  possible  to  derive  anew  the  basic  decomposition  (111.18)  of  the  group  algebra. 

II1.3  GROUP  TRANSFORMS 

The  importance  of  Fourier  methods  in  signal  processing  was  briefly  recalled  and  emphasized 
in  Section  1.2,  and  the  Fourier  transforms  were  defined  in  Equation  (1.1)  and  (1.2).  At  an 
abstract  level,  which  we  will  not  stress,  one  can  think  of  the  Fourier  transform  together  with  an 
accompanying  Plancherel  theorem  as  an  explicit  solution  to  the  general  problem  of  decomposing 
the  regular  representation  into  irreducible  components.  The  search  for  such  a  formula  for  various 
noncompact  and  nonabelian  groups  has  been  a  major  theme  of  group  representation  theory. 
However,  reviewing  this  is  not  germane  to  the  present  discussion.  We  will  continue  to  look  at  the 
case  of  finite  groups,  where  all  such  existence  questions  are  trivial,  and  to  view  the  Fourier 
transforms  as  simply  a  particular  kind  of  unitary  transform  with  interesting  properties  and 
structure,  and  possible  relevance  to  discrete  data  processing. 

We  can  ease  into  the  definition  of  the  Fourier  transform  through  the  idea  of  extending  a 
given  group  representation  to  the  group  algebra.  Let  G  be  a  compact  group  and  T:G— L(V)  a 
finite  dimensional  representation.  For  fiLi(G)  we  define 

T(0  =  J  f(g)  T(g)  dmclg)  (111.22) 

G 

This  extends  T  to  ’.e  a  continuous  representation  of  the  algebra  L'(G)  by  operators  on  V; 

l|T(f)ll  ^  Bllfll,  . 

T(f*h)  =  T(0  •  T(h)  .  (111.23) 

if  B  is  a  bound  on  |llT(g)l|:gtG|.  When  V  is  a  Hilbert  space  and  the  representation  is  unitary, 
then 


T(f)  =  T(0* 
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and  the  extended  T  is  a  ^-representation  of  the  algebra  L’(G).  Since  G  is  compact  and  of  Haar 
measure  one,  L-(G)  C  L'(G)  via  ||f|l2^  l|f|||.  so  that  T  is  also  a  norm-decreasing 
*-representation  of  L-(G).  When  G  is  finite,  of  order  N,  Equation  (111.22)  simply  defines  by 
linearity; 

T(0  =  S  f(g)  T(g)  (III.24) 

g«G 

Three  brief  comments  about  this  construction  are  appropriate.  First,  it  is  reversible,  so  that 
we  in  fact  have  a  one  to  one  correspondence  between  unitary  representations  of  G  and  nontrivial 
■"-representations  of  L*(G).  Second,  the  extended  T  has  the  same  commutant  as  the  group 
representation;  hence  if  one  is  irreducible,  so  is  the  other.  Third,  using  the  invariance  of  Haar 
measure,  one  easily  verifies  that 

T[R(go)n  ^  T(0T(go)*  ,  (III.25) 

where  R  is  the  (right)  regular  representation  of  G.  Hence  from  this  formula  and  the  earlier 
Equation  (HI. 23)  we  see  that  these  ’"-representations  of  L2(G)  send  convolution  products  and 
tranlations  into  certain  operator  products.  These  are,  of  course,  generalizations  of  familiar 
valuable  properties  of  Fourier  transforms,  which  we  next  define. 

From  now  on  we  restrict  attention  to  finite  groups  G,  denoting  ord(G)  by  N  and  the  unitary 
dual  object  of  G  by  P.  We  let  T,:G-'L(Vj)  be  a  representative  of  the  ith  class  of  P,  with 
dim(Vj)  =  dj,  i  ^  i  ^  r  ^  N.  Recall  the  r  =  N  only  for  abelian  groups.  Each  T,  extends  to  L“(G)  by 
formula  (111.24)  and  we  let  T  be  the  product  map 

T  =  (T, . Tr)  :  L2(G)~  n  L(vi)  .  (111.26) 

i=i 

It  is  a  consequence  of  the  structure  theory  for  L2(G)  recounted  in  the  previous  section  that  each 
T,  defines,  by  restriction,  an  isomorphism  between  J;  in  Equation  (111.18)  and  L(V,).  Indeed,  T;  is 
surjective  by  Burnside’s  theorem,  and  injective  by  the  formula 

f*X,  =  tr  [T,(0  •  T,(')*]  ;  (111.27) 

recall  from  Equation  (111.19)  that  the  left  side  above  is  d|’'f  if  feJ,.  This  last  fact  implies  that 

T,(Xi)  =  d[‘  1,, 

where  1;  is  the  identity  operator  on  V,.  Now  we  can  see  that  the  inverse  of  T,  on  KV,)  is  defined 
by 

=  f^  ,  (111.28) 

where  f^  was  defined  by  Equation  (111.20).  It  suffices  to  check  this  for  A=  l,,go).  go^G: 
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Ti(difA)  =  2  tr[T,(go)  •  T,(g-')]  T,(g) 

g 

=  S  tr  [Ti(gog-')]  T,(g) 

g 

=  4-  S  ‘r  [T>(h-')]  Ti(hg,) 

h 

=  di  S  tr[T.(h-')]  Ti(h)  T,(go) 

h 

=  di  [Ti(xi)]  T,(go) 

=  li  T,(go)  =  Ti(go) 

From  Equation  (III. 28)  in  turn  we  see  that 
T->  (A, Ar)  =  f  , 

where 

P,(0  =  difA_,  l^i^r  . 

This  yields  a  complete  analysis  and  synthesis  of  an  arbitary  function  feL^fG): 

f(g)  =  i  di  tr[T,(g-')  •  Ti(0]  (111.29) 

1=1 

The  group  (Fourier)  transform  is  the  mapping  T  defined  by  Equation  (111.24)  and  (III.26),  with 
inverse  transform  defined  by  Equation  (III.29)  above.  We  will  hereafter  refer  to  T  as  simply  the 
group  transform,  denoted  Fo(0.  sometimes  f  =  Fg(0,  to  emphasize  its  dependence  on  the 
group  G. 

We  conclude  the  first  half  of  this  section  with  several  comments  about  this  definition,  along 
with  one  more  key  property  (the  Plancherel  theorem);  then  we’ll  look  at  some  e.\amples  and 
discuss  the  complexity  issue. 

First,  because  we  are  limited  to  finite  groups,  there  are  no  convergence  or  integrability  issues 
and  the  inversion  formula  (111.29)  is  always  valid..  Second,  the  definitions  of  convolution, 
representation  and  e.xtended  *-representation,  character  of  a  representation,  and  choice  of  Haar 
measure  on  the  group  must  all  be  carefully  and  consistently  chosen  to  make  the  various 
important  properties  of  the  group  transform  work  out,  especially  those  describing  the  transform 
of  convolutions  (111.23)  and  translations  (111.25),  and  inversion  (111.29).  Other  definitions  appear 
in  the  literature:  our  f*h  may  be  another  author’s  h*f  or  (1/  N)  f*h,  our  character  another’s 
conjugate  character,  etc.  All  of  these  are  equally  valid,  as  long  as  they  are  consistently  followed. 
We  also  might  recognize  that  there  is  a  certain  nonuniqueness  in  our  definition  of  the  group 
transform,  in  that  it  depends  on  a  specific  choice  of  representation  T,  from  each  class  in  F. 
However,  this  nonuniqueness  is  really  inessential  as  both  the  dimensions  d,  and  chaiacters  x,  are 
well  defined,  and  hence  so  are  the  projections  P,  defiled  by  Equation  (111.19).  etc. 
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At  this  point  we  should  recall  that  our  major  theme  of  this  chapter  is  the  use  of  unitary 
transforms  for  data  processing.  We  now  want  to  see  that  the  group  transform  as  just  defined  is 
indeed  unitary.  This  involves  checking  that  its  range  has  a  Hilbert  space  structure,  and  that 

||fll2  =  llfll,  feL2(G)  (111.30) 

We  do  this  by  noting  that  the  range  of  Fq  is,  according  to  the- definition  of  T  in  (111.26),  just  the 
direct  product  of  the  operator  algebras  MVj),  i  =  1, , . . ,  r.  Since  each  Vj  is  finite  dimensional, 
each  of  these  algebras  is  actually  an  H*-algebra  under  the  Hilbert-Schmidt  inner  product 
[<A,  B>  =  tr(AB*)].  Hence  the  product  space  is  also  an  H^-algebra  under  the  inner  product 

r 

<(Ai . Ar),  (Bi,  .  .  ..,  Br)>  =  X  <A,’  Bi> 

i=l 

We  will  denote  this  product  space  L^jP),  since  it  can  be  also  thought  of  as  the  space  of  all 
functions  on  P  whose  value  at  the  ith  class  is  an  operator  on  V,.  If  a  measure  p  on  the  discrete 
space  P  is  defined  by  assigning  the  value  dj  to  the  ith  class,  then  0  =  (A|,  .  .  .  ,  A^)  e  L^jP)  has 
the  norm 

m-  =J'll(A, . Ar)||2dp 

p 

r 

=  X^itrlAjAi*)  .  (111.31) 

1=1 


What  we  claim  is  that  Fg:L2(G)— L2(P)  is  unitary  in  that  the  relation  (111.30)  holds  when  |lf||  is 
defined  by  (111.31),  Specifically,  we  have  the  Ptancherel  theorem  for  the  finite  group  G: 

<f.  h>=  i  d,  ir  [T,(0  Ti(h)*]  (111.32) 

1=1 

for  all  f,  htL2(G).  (By  contrast,  the  formula  obtained  from  Equation  (111.29)  by  setting  g  =  e 
(group  identity)  is  often  called  Plancherel’s  formula.  It  relates  ftL2(G)  to  its  scalar  (not 
operator!)-valued  Fourier  transform,  thus  assigning  to  f  the  scalar  function  on  G  whose  value  at 
the  ith  class  is  <f,  x,>,  1  ^  i  ^  r.  Of  course,  tnis  transform  is  one-to-one  only  when  G  is 
abelian.) 

In  essence,  this  lormula  is  just  a  reflection  ol  the  orthogonal  decomposition  (111.18).  If.  for 
example,  we  make  use  of  the  frames  in  J,  defined  by  Equation  (111.21),  then 

llfll?  =  i  i  i<ff‘;‘„>|2 

1=1  m,n=l 

=  i  d,[  S  l<T,(0e'„".e^>>|2] 

1=1  m.n 
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=  S  d,  tr  [T,(0  T,(f)*]  , 

1=1 

by  definition  of  trace.  By  polarization,  this  is  equivalent  to  Equation  (III. 32).  Alternatively,  one 
can  simply  expand  the  right  hand  side  of  Equation  (111.32).  and  employ  the  formula 

djxi  dr  Xr=N- id  ,  (111.33) 

where  ‘id’  is  the  identity  element  of  the  group  algebra,  that  is,  the  indicator  function  of  the  group 
identity..  This  latter  formula  arises  again  from  the  decomposition  (III.  18)  and  the  projection 
formula  (III.  19)  since,  for  any  feL2(G), 

f=  i  p.(n=  ifdix, 

i=l  1=1 

=  f  (idUi) 

i=l 

It  also  arises  directly  from  the  inversion  formula  (111.29)  by  the  substitution  f  =  id. 

This  completes  the  background  development  in  Fourier  analysis  for  finite  groups.  We  note 
that  all  these  formulas  generalize  rather  directly  to  general  compact  groups.  There,  as  in 
Equation  (111.31),  a  Plancherel  measure  exists  and  assigns  finite  mass  dj  to  each  class  jT,|  in  F. 
The  Plancherel  theorem  for  lea  groups  and  for  certain  locally  compact  nonabelian  groups  is 
discussed  in  References  [11]  and  [13],  respectively,  for  Chapter  1.  This  theorem,  and  the 
accompanying  inversion  formula  all  specify  quantitatively  the  way  in  which  the  irreducible 
representations  of  a  group  G  permit  a  harmonic  analysis  of  square-integrable  functions  on  G. 

Before  moving  on  to  the  complexity  of  the  group  transform  and  its  data  processing 
applications,  we  digress  briefly  to  the  topic  of  positive  definite  functions.  We  will  just  consider 
these  on  finite  groups;  although  much  (but  not  all)  of  what  we  say  extends  to  general  locally 
compact  groups.  By  definition,  a  complex-valued  function  d>  on  the  group  G  is  posiiive-definiie  if 
for  all  subsets  g|,  .  .  .  ,  gn  of  G,  the  matrix 

gj‘] 

is  positive  ;emidefinite.  In  particular,  it  follows  that 
I  d)(g)  is  0(e)  ^  0 
0(g‘')  =  0(g) 

for  all  gtt '.  If  b'.G—UV)  is  a  unitary  representation  of  G  on  a  Hilbert  space  V,  and  vtV.  then 
Mk  =  <  l'(g)v,  V  > 

defines  a  pi  iitivr-  iefinite  function  of  G  and,  in  fact,  all  such  functions  arise  in  this  way. 
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Positive-definite  functions  are  of  interest  for  data  processing  because  they  occur  as 
autocorrelation  functions  in  one  of  two  ways.  Suppose  first  that  f|,  functions  on  G. 

Their  cross-correlation  function  is  defined  by 

Pl.2(x)  =  J  fi(g)  f2(gx)  dmcXg) 

=  2  fl(g)  f2(gx) 

gtG 

for  xeG.  It  is  easy  to  check  that 

p,  2(x)  =  f*|  *  f2(x)  ,  xeG 

and  hence  that  in  the  case  f|  =  f2, 

Pl.l  =  f*i  *  f|.  (111.34) 

the  autocorrelation  function  of  f;.  As  elements  of  the  group  algebra,  functions  of  this  latter  form 
are  called  hermitian  squares,  and  are  positive-definite  by  virtue  of 

Pi  i(x)  =  <R(x)  f|,  f|> 

where  R  is  the  right  regular  representation  of  G. 

If  the  functions  f|,  f2  are  thought  of  as  sample  functions  of  a  stochastic  process  on  G,  the 
autocorrelation  and  cross-correlation  functions  are  essential  components  of  Wiener’s  generalized 
harmonic  analysis,  although  this  term  is  usually  applied  when  G  is  the  group  of  real  numbers 
(and  then  the  definition  of  pj  2  must  be  modified  to  account  for  the  infinite  Haar  measure  of  G). 

For  the  second  example  v;e  proceed  as  in  Section  II.2  and  consider  a  stochastic  process 
|xg:geG|  which  is  weakly  stationary  in  that  there  is  a  unitary  representation  U  of  G  on  Lo^(P) 
with  Xg  =  U(g)  •  Xj,  g«G.  The  function 

p(g)  =  EfXj  .Xj) 

then  satisfies 

E(Xg  xi;)  =  p(h-‘g)  ,  (111.35) 

and  may  again  be  called  an  autocorrelation  function.  However,  to  distinguish  between  these  two 
cases  we  will  refer  to  it  as  the  covariance  function  of  the  process.  Clearly 

P(g)  =  <Xg,  Xe>  =  <LI(g)  Xg,  Xe>  ,  (111.36) 

so  that  p  IS  a  positive-definite  function  on  G.  This  key  property  of  p  is  easily  established  because 
of  the  right  choice  of  definition  and  the  nontrivial  characterization  of  positive  definite  functions 
tT;entioned  above  Finiteness  of  G  is  not  at  all  needed  for  these  results. 

One  other  important  example  of  positive-definite  functions  is  the  character  xj.  as  defined  by 
Equation  (111.14),  of  a  finite  dimensio..al  unitary  representation  T  of  G.  The  proof  follows  from 
the  above  characterization,  again,  and  the  fact  that  the  positive-definite  functions  form  a  convex 
cone  in  the  group  algebra.  This  cone,  denoted  PD(G),  is  also  closed  under  conjugation, 
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involution,  and  products  (that  the  product  of  positive-definite  functions  is  again  positive-definite 
is  a  nice  application  of  Schur’s  result  that  the  Hadamard  product  of  positive  semidefinite 
matrices  is  again  positive  semidefinite). 

To  see  some  other  characterizations  of  positive-definite  functions,  let  S(G)  =  |  hermitian 
squares  }  =  I f:f  =  h**h,  heL^  (G)|,  and  M,2(G),  [resp.,  Mr(G)]  =  |  f€L2(G):left  (resp.,  right) 
multiplication  by  f  is  a  positive  semidefinite  operator  Then  it  is  not  difficult  to  show  that 

PD(G)=  >S(G)=  Mi(G)=  Mr(G), 

where  'S(G)  means  the  dual  cone  of  S(G),  that  is,  ‘S  =  |f:<s,  f>^  0,  seSj. 

Now  for  finite  groups  it  can  be  shown  that  actually 
IS(G)  =  S(G) 

in  other  words,  that  S(G)  is  a  self-dual  cone  in  the  group  algebra  L2(G).  This  is  a  consequence  of 
the  structure  theory  for  L^fG)  reviewed  in  Section  III. 2  together  with  the  analogous  fact  for  the 
H*-algebra  L(V),  V  a  finite  dimensional  Hilbert  space  [24]. 

Finally,  we  define  a  positive  function  to  be  one  with  a  positive  Fourier  transform.  That  is, 
we  set 

P(G)=  jfeL2(G):f^0j  ; 

this  means  that  each  operator  T,(0  is  a  positive  semidefinite  on  the  ith-representation  space  when 
T,  runs  through  T.  To  complete  our  circle  of  characterizations  of  PD(G),  we  claim  that 

P(G)=M,2(G) 

This  can  be  seen  in  various  ways.  For  example,  the  Plancherel  theorem  (111.32)  implies  that 

<f*h,  h>  =  i;  dj  <Ti(0  Ti(h),  T,(h)>  , 

1=1 

for  each  f,  heL^lG).  Keeping  in  mind  that  the  fact  mentioned  above  that  the  positive  semidefinite 
operators  on  a  finite  dimensional  Hilbert  space  form  a  self-dual  cone,  it  follows  from  feP(G)  that 
<f*h,  h>  ^  0  for  all  heL2(G)  and  so  feM  ^fG).  Conversely,  it  is  clear  from  basic  properties  of  the 
group  transform  that  S(G)  C  P(G),  and  we  already  know,  for  finite  groups  G,  that  M  i(G)  =  S(G). 

The  relation  P(G)  =  S(G)  can  be  considered  as  an  analogue  of  the  classical  Fejer-Riesz 
theorem  about  non-negative  trigonometric  polynomials,  this  being  essentially  equivalent  to  the 
corresponding  relation  P(Z)  =  S(Z)  for  the  integer  group  Z.  However,  not  much  more  generality 
is  possible;  for  example,  P(G)  #  S(G)  when  G  =  ZQ  Z  [24j. 

For  signal  processing  applications  the  most  important  of  the  above  characterizations  is 
PD(G)  =  P(G),  showing  that  with  each  positive-definite  function,  in  particular,  with  each 
autocorrelation  and  covariance  function,  is  associated  ‘something  positive’.  In  the  more  familiar 
case  where  G  is  abelian,  this  ‘something’  is  just  a  function  on  the  dual  group  P  with  non-negative 
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real  values.  Guided  by  the  more  general  case  of  lea  groups  it  is  more  useful  to  think  of  this 
function  as  a  measure  on  F.  This  brings  us  into  conformity  with  the  viewpoint  of  Section  1 1.2 
where,  via  Bochner’s  theorem  for  lea  groups,  the  covariance  function  was  viewed  as  the  Fourier 
transform  of  a  positive  measure  on  the  dual  group.  That  measure  was  termed  the  spectral 
measure  of  the  underlying  stochastic  process.  It  is  the  measure  or  its  derivative,  the  spectral 
density  function,  that  is  the  object  of  estimation  procedures  in  the  field  of  spectrum  estimation. 
This  is  not  an  area  that  we  are  going  to  discuss  except  to  note  that  prior  to  the  more  modern 
high  resolution  methods,  a  sample  function  was  used  to  make  estimates.  Either  the 
autocorrelation  function  was  first  estimated  and  then  Fourier  transformed  to  yield  a  spectrum 
estimate  (Blackman-Tukey  approach),  or  else  a  DFT  was  applied  directly  to  the  data  and  then  a 
multiple  of  its  squared  magnitude  served  as  the  estimate  (periodogram  approach).  There  are 
many  issues  here  that  must  be  resolved  before  convergence  of  such  estimates  can  be  guaranteed, 
and  techniques  of  time-domain  windowing  or  frequency-domain  smoothing  play  a  key  '■"le 
[25,  26], 

The  dual  relation  between  an  autocorrelation  or  covariance  function  and  its  group  transform 
is  called  the  Wiener-Khinchine  relation,  as  already  noted  following  Equation  (II. 2).  When  the 
underlying  group  is  not  abelian  the  resulting  transform  is  a  (positive  semidefinite)  operator-valued 
function  on  the  dual  object  F.  Much  of  the  rest  of  this  report  deals  with  tentative  data 
processing  applications  of  this  general  setup. 

We  first  note  a  purely  mathematical  formula  involving  positive  definite  functions,  and  then 
give  it  a  statistical  interpretation.  Let  </>  e  PD(G).  Then  <t>  defines  a  positive  linear  functional  ‘I> 
on  L“(G)  by  virtue  of  PD(G)  =  *S(G).  That  is,  we  have 

‘I'lO  =  <f,  (t>> 
and 

0^‘I>(f**0  =  <f**f.</>> 

=  <f,  f*d)>  =  <f.  f- R>  (111.37) 

where  the  operator  R  on  the  space  of  the  Fourier  transforms  f  [as  given  in  Equation  (111.26)]  has 
components  d,Ti(<i>)  in  each  L(V,),  1  ^  i  ^  r.  When  G  is  abelian,  each  d,  =  1  and  this  formula 
reduces  to  the  familiar  statement  that  convolution  with  a  positive  definite  function  transforms 
into  the  operaior  on  L“(F)  of  multiplication  by  a  positive  function. 

Now  let  a  data  vector  be  given,  and  considered  as  a  random  element  in  L“(G).  We  ask; 
when  does  the  group  transform  Fq  decorrelate  this  data?  In  other  words,  we  ask:  when  is  the 
covariance  operator  of  F^ldata)  a  diagonal  operator?  Experience  with  classical  transforms  already 
warns  us  that  this  is  a  rather  restrictive  condition.  For  example,  as  is  well  known,  the  ordinary 
DFT  (to  be  ‘officially’  defined  in  the  next  section)  decorrelates  a  data  vector  if  and  only  if  the 
covariance  matrix  is  a  circulant.  That  result  follows  from  the  expression  of  a  circulant  as  a 
polynomial  in  the  shift  (mod  N)  operator  on  L2(C^). 
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In  the  general  case,  let  the  data  be  a  realization  of  the  weakly  stationary  process  jxgigeGj 
with  covariance  function  p  as  defined  in  Equation  (111.35),  and  E(Xg)  =  0.  geG.  Then  the 
covariance  operator  on  L^fG)  is  (essentially)  the  operator  of  right  convolution  with  p.  That  is.  for 
feL2(G)  interpreted  as  a  linear  functional  of  the  data, 

var(f)  =  E(l<Xg,  f>|2) 

=  N2  <f,  f*p>  (111.38) 

Here,  N  =  ord(G)  as  usual,  and  its  appearance  is  due  to  the  basic  definition  of  convolution  and 
choice  of  Haar  measure  on  G.  Also  note,  notationally,  that  the  inner  products  in  Equation  (111.38) 
refer  to  L2(G)  and  not  to  a  space  of  random  variables,  as  in  Equation  (II.  1)  or  (111.36). 

Equation  (111.38)  gives  the  statistical  interpretation  of  Equation  (111.37),  and  from  the  latter  we 
also  see  the  form  of  the  diagonal  operator  R  which  serves  as  the  covariance  operator  of  the 
transformed  data  |  .Xgi  geG  | .  Namely,  each  component  of  R  in  the  decomposition  of 
Equation  (111.26)  has  the  form  djTifp),  and  each  of  these  is  a  positive  semidefinite  operator  on 

L(V,),  i  =  1, . .  .  ,  r.  There  is  therefore  an  eigenvector  frame  ej6);j  =  I . d;  in  each  V; 

for  the  operators  T,(p)  and  hence  the  operator  R:A— Ap  is  diagonal  wrt  the  frame 
jejC)0e,^U);j,  k  =  l,...,di,  i  =  1 . rj.. 

Once  again,  in  the  more  familiar  abelian  case,  there  is  a  particularly  nice  version  of 
Equation  (111.38).  Namely,  in  terms  of  the  spectral  measure  p  on  F, 

var(0  =  J  Ifpdp 

r 

What  has  just  been  shown  is  that  the  group  transform  Fq  decorrelates  any  weakly  stationary 
data  |xg:g«G|  in  the  sense  that  the  covariance  operator  of  |FG(Xg)|  is  a  diagonal  operator.  This 
argument  can  be  run  backwards;  if  the  transformed  data  has  a  diagonal  covariance,  then  the 
original  covariance  is  right  convolution  with  a  positive-definite  function  p  by  our  general 
theory.  That  is 

b>  =  <a*p,  b> 

for  a,  beL2(G).  Expanding  both  sides,  we  have 
<C^d,  b>  =  E(<Xg,  "a>  <x^,  b>) 

=  S  S  E(^  Xh) 

g  h 
and 

<a*p,  b>  =  ^  p(g-'h)  ag  , 

^  g  h 

showing  that 
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and  hence  that  jxgigeCj  is  weakly  stationary. 

Another  way  to  phrase  this  conclusion  is  that  the  covariance  operator  of  the  data  should  be 
a  G-circulant,  that  is,  diagonal  in  the  frame-basis  for  L^fG)  defined  by  the  group  transform. 
When  G  is  abelian  this  amounts  to  saying  that  this  operator  should  be  diagonal  in  the  character 
basis  for  L^fG).  This  specializes  to  the  classical  case  when  G  is  cyclic  and  the  group  transform  is 
the  ordinary  DFT:,  the  latter  will  decorrelate  a  random  vector  if  and  only  if  the  covariance 
matrix  is  a  circulant. 


III.4  TRANSFORM  COMPLEXITY 


Let’s  summarize  where  we  stand  in  this  survey  of  the  use  of  finite  groups  for  discrete  data 
processing.  In  Section  III.l  we  discussed  the  general  rationale  of  taking  unitary  transforms  of  a 
data  vecto;.  Then  we  noted  that  such  a  vector  could  be  realized  as  an  element  of  the  group 
algebra  oi  groups  whose  order  coincided  with  the  data  blocklength.  This  permitted  us  to  bring  to 
bear  the  general  structure  theory  of  discrete  group  algebras  and,  in  particular,  to  define  the 
associated  group  transform  as  a  unitary  operator.  It  is  this  type  of  operator  that  will  be  our 
focus  for  the  remainder  of  this  report;  however,  in  keeping  with  its  general  level  we  will  continue, 
for  the  most  part,  to  avoid  great  detail  in  specific  examples.  That  is  more  properly  deferred  to  a 
more  narrow  and  specialized  study. 

In  earlier  sections  we  have  occasionally  made  reference  to  the  notion  of  a  ‘fast’  unitary 
transform  (FUT)  without  attempting  a  definition.  The  general  subject  of  fast  transforms  has  been 
vigorously  developed  since  1965,  when  the  Cooley-Tukey  FFT  algorithm  appeared  [27];  a  fairly 
current  view  of  the  state  of  the  art  is  given  in  the  book  [28],  Here  we  will  just  recognize  two 
rather  general  approaches  to  the  problem,  which  is,  in  essence,  simply  the  efficient  computation 
of  the  matrix-vector  product  Ux,  where  x  may  be  of  rather  large  dimension  (e.g.,  several  hundred 
or  thousand).  One  is  by  a  somewhat  ad  hoc  collection  of  rules  for  manipulating  the  rows  and 
columns  of  a  unitary  matrix,  so  as  to  preserve  its  unitary  nature,  and  for  building  new  larger 
unitary  matrices  from  sets  of  smaller  ones  by  recursive  application  of  the  Kronecker  product 
operation.  This  methodology  is  described  by  Fino  and  Algazi  [47].  The  second  approach  is 
through  the  use  of  group  theory  (naturally  restricted  to  group  transforms!)  and  is  sketched  out 
next. 

So,  our  problem  is  the  computational  complexity  of  the  group  transform  operation 
f=FG(0  . 

where  ftL^lG),  and  G  is  some  group  of  order  N,  Let’s  see  what’s  involved  if  we  just  proceed 
from  the  definitions  by  brute  force.  We’ll  first  consider  the  relatively  more  simple  case  where  G  is 
abelian  and  then  have  a  quick  look  at  the  general  case. 

Recall  that  when  G  is  abelian  the  dual  group  T,  also  of  order  N,  consists  of  N 

independent  characters  xi . Xn-  which  form  a  frame  in  L-(G).  In  this  case  the  group 

transform  is  the  unitary  map  from  L-(G)  to  L-IF)  defined,  according  to  Equation  (111.24),  by 

f(Xk)=  TT  S  f(g)  Xk(g)  .  k=l . N  (111.39) 

'  gtG 

If  we  employ  as  a  conventional  measure  of  computational  complexity  the  number  of  (complex) 
multiplications  and  additions,  we  see  that  the  complexity  of  directly  computing  f  from 
Equation  (111.39)  is  multiplications  and  N(N  -  1)  additions  (approximately,  one  of  the  ‘x’s  is 
identically  one,  and  we  ignore  the  1;N  factor).  So  we  may  say  that  Fq  is  a  ‘fast  transform’  if 
there  is  a  numerical  procedure  for  carrying  out  the  computations  in  Equation  (111.39)  that 
requires  ‘significantly’  fewer  than  0(N“)  multiplications  and  additions.  This  definr.ion  is 
necessarily  a  little  imprecise:  some  algorithms  may  be  faster  than  others. 
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Now  the  key  point  is  that  whether  or  not  a  given  group  has  a  fast  transform  and,  if  so,  how 
fast  it  is  [relative  to  the  OfN^)  benchmark],  depends  on  the  subgroup  structure  of  the  group. 
Suppose  that  G  is  both  abelian  and  decomposable,  that  is, 

G  =  G|XG2  (III.40) 

for  a  pair  of  subgroups  G]  and  G2,  of  orders  r  and  t,  respectively.  Then  st  =  N  and 

r  =  r,  xr2  (111.41) 

where  — ’  means  ‘isomorphic  to’.  This  isomorphism  is  accomplished  by  x  *“  (0.  'A). 
x(g)  -  x[(gi.  g2)]  =  ^(g!/  ■  '/f(g2).  for  Hence  we  can  rewrite  the  sum  in 

Equation  (III.39)  as 

N  f(x)  =  X  X(g) 

g<G 

=  S  S  ‘I(gl.  G2)] ‘/'(gl)  l/'(g2) 

82«G2  8|«Gi 

and  observe  that  the  right  hand  side  can  be  evaluated  with  a  total  of  s  -  1  +  t  -  1  complex 
multiplications  and  additions. 

If  one  of  the  subgroups  G|,  G2  is  itself  decomposable,  then  the  above  process  can  be 
repeated.  We  conclude  that  if  the  abelian  group  G  of  order  N  factors  as 

G  =  G,X...XGn 

with  ord(Gi) "  ^i*  be  computed  in  about 

N-  i{Ni-  1) 

1=1 

complex  multiplications  and  additions,  in  the  important  special  case  when  all  the  groups  G,  are 
isomorphic  and  of  order  p,  the  operations  count  above  reduces  to 

(p  -  1)  N  logp  N 

This  value  is  particularly  familiar  when  p  =  2,  and  motivates  the  practical  interest  in  working  with 
data  defined  over  groups  whose  order  is  a  power  of  2.  In  this  context  note  that  the  optimization 
problem 

min  jxj  +  .  .  .  +  x„:  X]  .  .  .  x^  =  N,  X|  ^  o| 
is  solved  when  all  the  x,  =  "\/N. 

Before  looking  at  some  examples  and  cases  where  the  group  does  not  factor,  we  should  first 
note  that  quite  analogous  reasoning  establishes  a  similar  reduction  in  complexity  for  the  group 
transform  over  certain  nonabelian  groups.  The  basic  assumption,  once  again,  is  the 
decomposability  of  the  given  group.  Thus,  as  before,  assume  that  G  is  of  order  N  and  that 
Equation  (111.40)  holds,  with  G|  and  G2  of  orders  N|  and  N2.  Let  {T] . tJ  be  a  choice  of 
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irreducible  representations  from  the  classes  of  F.  A  fast  algorithm  for  Fq  depends  on  the  proper 
generalization  of  Equation  (III.41).  This  is  achieved  through  use  of  the  tensor  product  notion  via 
the  correspondence  T-*R(^S,  where  R,  S  are  irreducible  representations  of  G|,  G2.  on  Hilbert 
spaces  U,  V,  respectively.  This  means  that  if  g  =  (g|,  g2)  e  G,  T(g)  is  that  operator  on  U  @  V 
whose  value  at  u(^  V  is  R(gi)u0S(g2)v. 

Now  the  ‘brute  force’  complexity  of  Fq  can  be  obtained  from  the  definition  (111.24)  with 
T  =  T], .  . ., ,  Tp  successively,  and  Burnside’s  formula  (III.  16).  We  find  a  total  of  Ndj^  -  d,  +  1 
multiplications  and  (N  -  l)d,2  additions  for  each  i,  1  ^  i  ^  r,  and  therefore  a  total  of  -  Sd,  +  r 
multiplications,  and  N(N  -  1)  additions;  hence,  just  as  in  the  abelian  case,  0(N2)  operations  in  all. 
These  counts  are  obtained  by  treating  each  Tj(g)  as  a  d;  X  dj  matrix. 

To  get  a  fast  algorithm  for  Fq  we  fix  i,  1  ^  i  ^  r,  and  write 

NTi(0=  S  S  [f(g|’g2)  R(gl)]@S(g2)  .  {111-42) 

in  accord  with  the  correspondence  indicated  above  between  irreducible  representation  of  G  and 
those  of  G]  and  G2.  The  Hilbert  spaces  on  which  the  operators  R(gi)  and  S(g2)  act  are  denoted 
U,  V,  respectively,  with  dimensions  d^  and  d^.  We  have  S  d^  =  N|  and  S  dy=  N2  where  the 
sums  are  extended  over  G|  and  G2;  also  d^d^  =  dj,  the  dimension  of  Tj.  Now  we  work  through 
the  arithmetic  in  Equation  (111.42),  beginning  with  the  inner  sum.  From  the  preceding  paragraph 
we  note  Njdy^  -  d^  multiplications  and  (N|  -  1)  •  d^^  additions.  Then  to  do  the  tensor  product 
requires  a  further  N2du2dy2  multiplications  and  (N2  -  Ody^d^^  additions  (working  with  matrix 
forms  of  the  operators,  where  the  tensor  product  goes  over  to  a  Kronecker  product).  Finally,  we 
have  to  repeat  this  for  all  possible  choices  of  R,  S  as  we  move  through  F]  and  F2.  Doing  so 
over  the  ‘R’s  first  results  in  at  most  N2Nidv2  -i-  N|2  multiplications:  then  counting  all  the  S  terms 
yields  a  final  total  of  at  most  N1N1N2  +  N2N12  =  N(Ni  +  N2)  multiplications,  to  which  we  could 
add  r^  N  more  because  of  the  factor  on  the  left  side  of  Equation  (III. 42).  Similarly  we  find  a 
total  of  N(Ni  N2  -  2)  additions. 

Thus  the  complexity  of  the  group  transform  for  general  groups  is  about  the  same  as  for 
abelian  groups  of  the  same  order.  So,  if  the  group  factors  into  a  product  of  more  than  two 
subgroups  we  can  proceed  just  as  before,  down  to  the  best  result  of  0(N  logp  N)  operations,  if 
the  group  permits  that  much  factorization. 

At  tnis  point  the  question  of  fast  inverse  transforms  naturally  arises.  In  the  case  of  abelian 
groups  it  is  clear,  by  duality,  that  the  inverse  transform  is  of  the  same  complexity  as  the  direct 
transform.  That  is,  any  factorization  ot  G  results  in  an  analogous  factorization  of  the  dual  group 
F,  as  indicated  l  y  Equation  (111.41).  The  corresponding  result  for  nonabelian  groups  is  not  so 
obvious,  as  the  inverse  transform  (111.29)  is  not  of  the  same  form  as  the  direct  transform. 
Nevertheless,  it  has  been  shown  [29]  that  the  complexity,  as  we  are  measuring  it,  is  the  same  for 
the  inverse  transform  in  the  nonabelian  case  too. 

From  now  on  we  have  to  be  more  specific  to  deal  further  with  fast  transforms.  In 
considering  a  particular  transform  on  some  group  G,  we  have  to  first  see  if  G  is  decomposable  in 
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the  sense  that  a  factorization  of  the  form  (III.40)  is  possible  and,  if  not,  whether  some  other 
procedure  can  be  effective.  All  of  this  circle  of  questions  pertains  to  the  subgroups  structure  of 
particular  groups,  and  a  successful  resolution  will  require  various  additional  assumptions  about 
G. 


Let’s  begin  by  stressing  the  role  of  cyclic  groups.  The  cyclic  group  of  order  N  is 
abstractly  defined  by  a  single  generator  a  and  relation  a^  =  e.  It  is  realized  in  various  concrete 
ways  as,  for  example, 

(a)  the  subgroup  of  the  circle  group  consisting  of  the  Nth  roots  of  unity; 

(b)  the  quotient  group  Z/NZ,  where  Z  is  the  integer  group; 

(c)  the  integers  0,  1, . .. . ,  N  -  1  with  addition  mod  N. 

These  are  the  most  elementary  examples  of  abelian  groups,  yet  even  these  may  well  not  be 
decomposable.  Indeed,  if  p  is  a  prime  then  Cp  is  a  simple  group,  and  Cpn  is  indecomposable  (but 
not  simple  if  n^  2).  Hence,  0^  is  decomposable  if  N  =  mn  with  (m,n)  =  1. 

We  note  an  initial  connection  between  representations  and  cyclic  groups,  realized  as  in 
part  (a)  above.  Namely,  let  G  be  a  group  of  order  N  and  T  a  finite  dimensional  representation. 
Then  for  any  geG, 

I  =  T(e)  =  T(gN)  =  T(g)N  , 

showing  that  the  spectrum  of  T(g)  is  contained  in  In  particular,  if  G  is  abelian  then  all 
characters  on  G  assume  values  in  C*,). 

The  cyclic  groups  are  important  in  our  subject  priman'v  because  those  of  prime  power  order 
are  the  building  blocks  of  the  general  abelian  group.  In  fact,  the  Fundamental  Theorem  for 
Abelian  Groups  permits  us  to  describe  all  abelian  groups  of  a  given  order.  We  recall  the  two-part 
statement,  given  an  abelian  group  G  of  order  N:  first,  if  N  =  P|“>  . . .  p^i  is  the  prime 
factorization  of  N,  then  G  =  G|  x  .  .  .  x  G^  where  Gj  is  the  subgroup  of  elements  of  order  pj*, 
t  <  Oj;  the  order  of  G;  is  Pi“',  and  this  decomposition  into  subgroups  of  prime  power  order  is 
unique.  These  subgroups  Gj  are  called  the  Sylow  subgroups  of  G,  Second,  each  such  ‘p-group’  is 
isomorphic  to  a  product  of  cyclic  p-groups.  In  fact,  if  H  is  any  abelian  group  of  order  p"’,  p  a 
prime,  then  there  is  a  unique  list  of  integers  |m|, , . .  ,  mr|,  with  m|  ^  .  .  .  ^  m^  1,  called  the 
type  of  H,  such  that 

H  =  Cpm|  X  ...  X  Cptn^ 

This  theorem  is  actually  a  special  case  of  •  le  cyclic  decomposition  of  a  finitely  generated  module 
over  a  principal  ideal  domain,  e.g.,  [30,  Chapter  XV],  but,  of  course,  it  can  be  established  more 
directly,  e.g.,  [30,  Chapter  I]. 

This  theorem  permits  us  to  factor  any  finite  abelian  group  into  indecomposable  (cyclic) 
factors  and  therefore  to  describe  all  abelian  groups  of  a  given  order.  For  example,  the  following 
table  (lll-l)  displays  the  distinct  abelian  groups  of  certain  low  orders  in  factored  form,  and  also 
indicates  the  total  number  of  (nonisomorphic)  groups  of  that  order. 
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TABLE  III-1 

Abelian  Groups  of  Low  Order 

Order 

Abelian  Groups 

Total  Groups 

C4,  (C2)2 

2 

Ce 

2 

8 

Ca,  C4  X  C2,  (C2)3 

5 

9 

Cg,  (C3)2 

2 

10 

C10 

2 

12 

Cl  2.  Ca  X  C2 

5 

16 

C16,  Ca  X  C2,  C4  X  C4,  C4  X  (C2)2  (C2)^ 

14 

24 

^24*  ^12^  C2,  Ca  X  (02)^ 

15 

32 

C32.  CiaXC2 . (C2)5 

51 

64 

C64'  ^32  ^  ^2 . (C2)® 

267(1) 

96 

C32  X  C3 . (C2)5  X  C3 

230 

With  these  preliminaries  aside,  let’s  now  return  to  the  subject  of  fast  group  transforms.  We 
see  that  if  the  underlying  group  is  abelian,  the  issue  has  been  reduced  to  the  case  where  the 
group  is  cyclic  of  prime  power  order.  We  know  that  such  groups  are  indecomposable; 
nevertheless,  they  still  have  lots  of  subgroups.  In  general,  if  G  is  abelian,  and  m  divides  ord(G), 
then  G  has  a  subgroup  of  order  m.  This  ails,  however,  for  the  simplest  nonabelian  groups.  Now 
if  G  is  cyclic  and  m  divides  ord{G)  then  there  is  exactly  one  (necessarily  cyclic)  subgroup  H  of  G 
of  order  m  and,  in  fact,  this  statement  is  characteristic  of  cyclic  groups  [31].  We  can  describe  H 
explicitly  in  terms  of  any  generator  g  of  G;  H  =  |e,  h, .  .  .  ,  h"’'’|  where  h  =  g*^,  and  mk  =  ord(G). 
Clearly,  when  G  is  cyclic  of  order  p",  the  only  possible  values  for  m  =  ord(H)  are  m  =  p‘,  t  ^  n. 

The  following  result  now  settles  the  specification  of  a  fast  algorithm  for  any  group  transform 
over  a  finite  abelian  group.  Let  G  be  abelian  of  order  N  and  H  a  subgroup  of  order  m  and 
index  s,  so  that  ms  =  N.  Then  the  complexity  of  the  group  transform  on  G  is  essentially 
0[N(m  +  s)].  To  see  this,  we  partition  G  into  its  cosets  by  H: 

G=  H  U  g,H  U  .  ..  U  g3.,H  , 

and  then  write,  for  each  x  «  r'< 

N  f(x)  =  S 
gtG 
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(111.43) 


=  Sf(h)x(g)+  S  f(g|h)  x(gih)  + . .  . 

heH  htH 

=  ^  [f(h)  +  f(g,h)  x(gi)  +  .  .  .  +  f(gs-ih)  x(gs-l)]x(h) 

heH 

From  this  expansion  it  is  clear  that  the  inner  parenthesis  involves  (s  -  1)  multiplications  and 
additions.  Then  we  see  (m  -  1)  more  multiplications  to  accommodate  the  factors  x{h)  (keeping  in 
mind  that  x(e)  =  1),  and  finally  (m  -  1)  further  additions.  Repeating  this  for  all  N  characters  in  G 
results  in  N(m  +  s  -  2)  multiplications  and  additions,  as  stated.  Not  surprisingly,  this  is  the  same 
figure  zz  reported  previously  for  decomposable  groups.  Naturally,  this  procedure  can  be  repeated 
if  H  has  a  proper  subgroup. 

The  preceding  algorithm  is  basically  the  Cooley-Tukey  method  [27],  designed  originally  for 
the  ordinary  DFT  (about  which  more  momentarily).  Its  extension  to  the  abelian  group  context  is 
due  to  Cairns  [32],  a  few  years  later.  During  the  following  decade  there  was  extensive 
development  of  fast  algorithms  for  the  DFT  and  other  discrete  transforms,  based  on  number 
theory  and  matrix  representations.  So  the  method  we  have  displayed,  while  providing  a  cute 
application  of  elementary  group  theory,  and  while  of  both  historical  and  practical  significance,  is 
not  the  last  theoretical  word  in  computational  efficiency..  We  have  in  mind  especially  the  work  of 
Winograd  [33]  and  Nussbaumer/ Quandalle  [34],  which  is  aimed  at  further  reduction  of  the 
number  of  multiplications  required  to  compute  ihe  DFT.  Chapter  5  of  [28]  is  a  good  general 
reference,  while  Reference  [29]  of  Chapter  II  discusses  the  Winograd  algorithm  from  an  advanced 
standpoint.  In  the  author’s  opinion,  these  fancier  methods  have  not  had  a  major  impact  in  the 
day-to-day  practice  of  computing  large  DFTs. 

Looking  in  the  other  (chronological)  direction  we  can  note  that  the  Cooley-Tukey  procedure 
was  not  completely  original  or  unprecedented.  That  their  paper  [27]  had  such  an  impact  on 
digital  signal  processing  was  a  matter  of  fortunate  timing,  reflecting  both  increasing  appreciation 
of  the  uses  of  discrete  Fourier  and  spectral  analysis  (physical  chemistry,  seismology,  econometrics, 
etc.),  and  also  increasing  computing  power.  The  historical  record  of  the  classical  FFT  includes 
the  names  of  1.  Good  (1958),  G.  Danielson-C.  Lanczos  (1942),  C.  Runge  (1903),  and  probably,  it 
is  fair  to  say,  Buys-Ballot  (1847).  Some  of  this  historical  perspective  is  given  in  [35]. 

Having  now  referred  to  the  classical  DFT  several  limes,  beginning  in  Section  11.1  and  most 
recently  just  above,  let  us  place  this  most  prominent  of  discrete  linear  transforms  into  our  group 
context.  This  is  quite  easy  and,  appropriately,  it  is  associated  with  the  most  prominent  of  abelian 
groups:  the  cyclic  group.  Supr>Gse  we  start  with  an  arbitrary  finite  cyclic  group  C;^  with 
generator  a.  Being  abelian  its  irreducible  unitary  representations  are  all  one-dimensional  and 
coincide  with  its  group  characters.  These  comprise  t.he  dual  group:  =  |xi . XnJ-  Let 

w^  =  expt-27ri/  N) 

Then  the  formula 

Xm(a'‘)  =  wi^'"  ,  k  =  0,  1, . . ,  N  -  1 
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defines  a  character  Xm  fo*"  m  =  1, , . . ,  N,  It  is  easy  to  see  that  these  characters  are  distinct 
from  each  other  so  that  by  the  general  theory  we  have  all  of  them. 

Now  if  f  =  (fg, ,  ..>  ,  fjsj.i)  is  a  function  in  L^fG),  we  apply  the  definition  (111.24)  of  the  group 
transform  to  obtain 

I 

k=0 

=  4-  i  fk  •  m=  1,....,,N 

‘  k=0 

The  right  hand  side  here  is  recognizable  as  the  usual  formula  for  the  DFT  (e.g.,  [28,  Chapter  3]). 
We  have  thus  shown  the  ordinary  DFT  to  be  the  group  transform  associated  with  a  cyclic  group 
of  appropriate  order.  The  standard  properties  of  the  DFT,  its  inverse,  and  the  FFT  now  follow 
routinely  from  our  more  general  group  theoretic  developments.  Further,  these  developments  also 
imply  that  the  group  transform  on  any  finite  abelian  group  is  just  a  multidimensional  DFT. 
Explicitly,  given  such  a  group  G,  we  factor  it  as 

G  =  C«,j,  X  ...  X  , 

where  each  Nj  is  some  prime  power  and  each  C^.  has  generator  a,.,  The  dual  group  P  then 
factors  similarly: 

r  =  Cn|  X  ...  X  Cnj 

and  each  character  on  G  is  a  product  of  characters  on  the  associated  cyclic  groups.  Hence  the 
group  transform  on  G  has  the  form 

f(x)  =  f[(xii  •  •  • .  Xs)] 

=  2  fk  . 

‘  k 


where  the  summation  index  k  =  (k|, .  .  . ,  kj),  each  k,  runs  from  0  to  N,  -  1,  and  the  indices 
(m|, .  .  . ,  mj)  index  the  characters  x«r,  with  1  ^  m,  ^  Nj. 

In  a  slightly  different  direction,  let’s  consider  an  integer  N  of  the  form  N  =  p”,  p  a  prime. 

Let  G  be  the  corresponding  group 

G  =  (Cp)",  (111.44) 

the  product  of  n  copies  of  Cp.  We’ll  call  such  a  group  a  p-aciic  group\  in  particular,  a  dyadic 
group  when  p  =  2.  Groups  of  this  special  structure  are  of  frequent  occurrence  in  applied 
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mathematics.  For  example,  they  are  exactly  the  possible  additive  groups  of  finite  (Galois)  fields, 
and  hence  are  involved  in  treatmerts  of  algebraic  coding  theory.  However,  we  shall  refrain  from 
such  an  excursion,  and  stick  to  signal  theory  here. 

Suppose  we  realize  Cp  now  as  the  set  of  integers  jo,  1, .  .  .  ,  p  -  l| .  We  can  map  G  in  a 
one-to-one  manner  onto  the  set  =  |o,  1, .  . . ,  N  -  i  {  by  the  rule 

x=  Xi  p"-i,  (III.45) 

i=l 

where  each  x,  This  permits  us  to  move  back  and  forth  between  the  group  algebra  L^IG)  and 
vectors  f  =  (fg, .  .  . ,  fN.i)  e  considered  as  functions  on  S^.  Now,  as  before,  any  character  in  V 
can  be  factored  into  a  product  of  n  characters,  each  in  Cp.  Hence  each  character  Xm  r'  defines 
on  Sjj  a  function  of  the  form 

n 

=  n  wr'i^' 

1=1 

=  Wp  2  mjXi  ,  (III.46) 

i=l 

where  the  integers  m,  and  Xj  are  defined  from  m  and  x  in  8^  by  the  rule  (III. 45). 

We  have  thus  defined  a  family  =  0,  1, . . . ,  N  -  l|  of  functions  on  S^,  or, 

equivalently,  a  subset  of  C^,  which  is  seen  to  be  a  group-frame  in  C^.  Such  a  frame  may  be 
called  a  set  of  discrete  Vilenkin-Chrestenson  functions  (cf.  [36]  and  below).  Two  extreme  cases 
are  of  special  interest:  the  case  n  =  1,  which  is  the  DFT  of  prime  order,  and  the  case  p  =  2.  This 
latter  case  involves  the  dyadic  group  of  order  2".  Here  Wp  =  W2  =  -  1,  and  so  (i>mM  =  ±1,  for 
every  m  and  x.  (Of  course.  discrete  V-C  systems.)  The  frames  B|  and 

B2  introduced  at  the  beginning  of  Section  1II.2  are  the  special  cases  of  this  construction  when 
N  =  4. 

The  transform  corresponding  to  the  dyadic  group  case  (p  =  .2)  is  usually  referred  to  as  the 
Walsh- Hadmard  transform  (WHT)  [28,  Chapter  8].  In  matrix  terms  it  is  defined  by 

f  =  H„.,  f  , 


where  »„.!  =  [u^J,  and 

Umk  =  (-0  “ 
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with  the  integers  m,,  k,,  i  =  1, .  . ,  2",  once  more  coming  from  the  rule  (III.44).  Thus,  for  example. 


H,= 


1 

1 


1 

-1 


and 


This  completes  our  examples  of  specific  group  transforms  over  abelian  groups.  Readers  may 
consult  the  literature  [1,  28,  36]  for  more  details,  many  of  which  clearly  follow,  in  unified 
fashion,  from  our  group  theoretical  approach.  And,  as  we  recall  from  the  end  of  Section  IIl.l, 
there  are  many  other  unitary  transforms  which  do  not  arise  as  group  transforms. 


Before  leaving  this  topic  we  want  to  make  a  few  final  remarks.  First,  for  any  fixed  N, 
consider  the  set  of  abelian  groups  of  order  N,  and  the  associated  group  algebras  which  we  have 
been  using  as  sample  spaces.  Being  N-dimensional  these  algebras  are  certainly  equivalent  as 
Hilbert  spaces.  But,  as  a  consequence  of  a  result  of  Kellogg  [37]  about  certain  commutative 
H*-algebras,  they  are  actually  isometrically  ♦-isomorphic  as  H*-algebras.  Therefore,  the 
associated  group  transforms,  which  preserve  all  this  structure,  are  strongly  equivalent  as 
representations  of  the  algebras.  Based  on  this  observation,  the  structure  theorem  for  finite  abelian 
groups,  and  the  fast  algoritl.  is  which  ensue  from  this  structure,  there  would  seem  to  be  little 
basis  for  preferring  one  abelian  group  transform  to  another.  Here  is  a  specific  instance  of  the 
geneial  issue  about  data  processing  raised  back  in  Section  1.2:  why  do  we  perform  one  operation 
on  data  and  not  another? 


The  answer  to  this  question  must  come  from  outside  the  mathematics;  specifically,  from  the 
nature  of  the  underlying  signal  and  from  the  goal  of  the  processing.  For  instance,  the  goal  may 
be  Wiener  filtering  or  signal  decorrelation.  For  filtering  we  refer  to  the  next  section.  For 
decorrelation  the  signal  statistics  and  performance  criterion  must  be  specified.  An  example  of  the 
latter  might  be 


where  rj-  is  the  quantity  defined  by  Equation  (111.10)  and  r^*  is  the  corresponding  quantity  for 
the  original  untransformed  covariance  matrix.  Such  measures  of  transform  efficiency  can  be 
compared  for  varying  group  or  other  unitary  transforms  and  data  covariances.  When  the  latter 
are  taken  to  be  those  associated  with  Markov- 1  processes,  for  instance,  it  is  found  that  the  DFT 
and  WHT  have  similiar  efficiencies  for  decorrelation  (and  also  for  signal  coding),  with  a  small 
advantage  to  the  WHT  [I,  Chapter  3]. 

There  is  also  the  matter  of  the  nature  of  the  underlying  signal  from  which,  via  sampling  and 
other  preprocessing  steps,  we  have  obtained  our  N-dimensional  data  vector.  Although  it  would 
lead  us  too  far  afield  to  pursue  this  matter  seriously,  its  importance  requires  us  to  at  least 
indicate  the  issues.  It  is  tied  up  with  the  problem  of  asymptotics  already  discussed  in 
Section  111.1.  Here  we  will,  as  usual,  emphasize  the  group-theoretic  aspects. 
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We  presume  that  our  data  sample  has  arisen  from  observations  on  some  stochastic  process 
indexed  by  an  infinite  group,  such  as  the  integers,  the  reals,  etc.  The  covariance  function  of  this 
process  naturally  determines  the  covariance  matrix  of  the  data  vector,  and  there  is  nothing  new 
to  add  to  this  aspect  of  the  situation.  Instead,  what  we  want  to  think  about  is  the  nature  of  the 
sample  functions.  As  already  noted  in  Section  II. 3  there  is  little  loss  of  generality  in  considering 
these  functions  to  belong  to  a  (weighted)  space.  There  is  then  the  question  of  an  appropriate 
orthonormal  coordinate  system  (or  frame,  again).  We  expect  there  to  be  a  relation  between  the 
signal  paths,  a  frame  for  the  path  space,  and  the  finite  frame  which  defines  the  unitary  transform 
applied  to  the  data  vector.  In  particular,  as  a  group  transform  is  associated  with  a  group-frame, 
we  might  ask  about  group-frames  in  the  path  space.  This  is  really  just  the  same  issue  as  was 
raised  early  in  this  chapter  except  that  here  it  is  being  extended  to  an  infinite  dimensional 
context.  A  complete  theory  will  involve  the  suitability  of  the  group-frame  in  signal  space,  which 
is  basically  a  matter  of  approximation  theory,  and  the  expression  of  the  attached  group  as  a  limit 
of  finite  groups  —  another  kind  of  approximation.  The  suitability  of  a  frame  may  also  involve 
the  statistical  behavior  of  its  coordinates  as  this  derives  from  the  assumptions  about  the 
underlying  process. 

As  an  example,  suppose  that  this  process  is  defined  over  a  finite  interval  of  the  real  line. 
Extending  it  periodically,  we  can  view  it  and  its  sample  functions  as  defined  on  the  circle  group 
T.  Under  its  usual  group  structure  T  is  compact  abelian  with  dual  group  T  isomorphic  with  the 
integer  group  Z  by  the  correspondence  n—expfint).  Hence,  expansion  of  functions  in  L^fT)  is  just 
classical  Fourier  series.  Realizing  the  cyclic  group  as  a  subset  of  T,  the  DFT  appears  as  a 
sampled  version  of  ;ne  usual  Fourier  expansion.  This,  of  course,  is  well  known.  But  T  can  be 
approximated  by  other  finite  groups.  For  instance,  we  can  let  n*-«o  in  the  definition  (III. 44)  of 
p-adic  groups,  and  obtain 

Cp*  =  CpXCpX..., 

a  compact  abelian  group  in  the  product  topology,  with  Haar  measure  equal  to  the  product  of  the 
d.screte  Haar  measures  on  each  Cp.  By  using  base-p  expansions  we  can  define  almost  everywhere 
on  Cp  a  one-to-one  transformation  onto  T  which  preserves  Haar  measure  (recall  that,  on  T, 

Haar  measure  is  just  normalized  Lebesgue  measure).  If  this  transformation  is  employed  to 
transfer  the  characters  in  (C“)  over  to  T  we  obtain  a  frame  in  L^fT),  the  generalized  Walsh 
functions  of  Chrestenson  [38],  with  the  ordinary  Walsh  functions  resulting  when  p  =  2. 

More  generally,  we  can  take  note  of  some  work  of  Fine  [39,  1]  who  showed  that  in  fact  we 
can  transfer  the  (countable)  dual  group  of  any  compact  metrizable  abelian  group  into  L2(T)  so  as 
to  be  an  orthonormal  set  there,  which  is  either  finite  or  complete  (and  hence  a  frame).  An 
analogous  result  remains  valid  for  all  spaces  L^fP),  P  a  probability  measure  on  some  measure 
space,  if  we  assume  that  the  orthonormal  set,  rather  than  the  underlying  space,  carries  a  group 
structure  [39,11]. 

The  group  C“  are  examples  of  a  special  kind  of  topological  group,  called  Vilenkin  groups. 
These  are,  by  definition,  abelian  topological  groups  that  are  second  countable,  periodic  (each 
element  belongs  to  a  compact  subgroup),  and  totally  disconnected.  Any  compact  Vilenkin  group 
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is  essentially  the  direct  sum  Cpi  X  Cp2  X  . . . ,  for  certain  primes  Pi,  p2>  •  ■  ■  Such  groups  can 

be  mapped  onto  T  much  as  above,  and  their  dual  groups  thereby  go  over  into  frames  for  L-(T). 
Because  of  the  zero-dimensionality  of  a  Vilenkin  group,  its  dual  group  must  be  a  torsion  group; 
this  special  structure  of  the  corresponding  group-frames  in  L^fT)  distinguishes  them  from  general 
orthonormal  systems  there,  and  makes  possible  an  elegant  special  theory  of  Vilenkin-Fourier 
series. 

We  conclude  this  section  with  a  few  remarks  about  the  complexity  of  the  group  transform 
on  a  nondecomposable  and  nonabelian  group.  This  is  the  only  situation  that  has  not  been 
discussed  so  far,  and  naturally  it  leads  to  some  deeper  issues  in  group  theory  than  have  appeared 
to  date.  The  passage  from  abelian  to  nonabelian  groups  should  be  viewed  in  the  same  spirit  as. 
for  example,  the  passage  from  stationarity  to  nonstationarity  (cf.  comments  e.trly  in  Section  11.4). 

Let  G  be  a  finite  nonabelian  group.  If  G  is  decomposable  we  can  apply  the  analysis 
following  Equation  (III.42)  to  get  a  reduction  in  the  complexity  of  the  group  transform  Fq.  If 
not,  but  there  is  a  subgroup  H,  we  can  proceed  in  just  the  same  way  as  was  done  in 
Equation  (III.43)  for  abelian  groups.  If  this  is  done  it  will  be  seen  that  the  reduced  operations 
count  is  essentially  0[N(m  +  s)]  again  (here  N  =  ord(G),  m  =  ord(H),  s  =  [G:H],  as  before).  The 
problem  now  is  that,  unlike  the  abelian  case  where  the  indecomposable  factors  had  the  special 
form  of  cyclic  groups  of  prime  power  order,  the  subgroup  structure  of  general  indecomposable 
g'^oups  is  much  more  varied  and  complex. 

To  obtain  the  maximum  reduction  in  complexity  we  would  ideally  like  to  find  a  subgroup  H 
of  maximal  order  in  G,  and  then  repeat  this  process  in  H,  etc.  This  may  indeed  be  possible  for  a 
particular  G,  but  it  is  hard  to  generalize.  There  are  two  general  ways  to  proceed:  we  can  look  at 
groups  defined  by  special  constructions  (e.g.,  generators  and  relations,  semidirect  products),  or  by 
special  properties  (e.g.,  nilpotent,  solvable),  where  at  least  the  existence  of  an  adequate  number  of 
subgroups  may  be  guaranteed.  In  either  approach  our  leitmotiv  will  be  to  only  consider  groups 
that  are,  in  a  suitable  sense,  'close  to  abelian’. 

At  the  outset  it  must  be  recognized  that,  again  unlike  the  abelian  case,  if  an  integer  m 
divides  ord(G),  there  may  not  be  a  subgroup  of  order  m.  The  standard  example  is  the 
nonexistence  of  subgroups  of  order  6  in  the  alternating  group  A4,  which  is  of  order  12.  So  it 
seems  that  we  would  like  to  restrict  attention  to  groups  G  with  the  following  property:  if 
ord(G)  =  mn  with  (m,  n)  =  1  then  G  has  a  subgroup  of  order  m.  We  encounter  some  good  fortune 
here  in  that  this  property  turns  to  be  characteristic  of  a  large  and  familiar  class  of  groups,  and 
further,  this  class  is  closed  under  the  operations  of  forming  subgroups,  products,  and  quotient 
groups.  The  class  in  question  is  the  class  of  all  solvable  groups.  Such  groups  were  historically 
first  considered  in  connection  with  the  problem  of  studying  roots  of  a  polynomial  over  a  field;  in 
this  context  and  in  a  nutshell,  a  polynomial  is  solvable  by  radicals  only  if  its  Galois  group  is 
solvable,  the  general  polynomial  equation  of  degree  n  has  the  symmetric  group  S,j  as  Galois 
group  and,  for  n  ^  5,  is  not  a  solvable  group. 

There  are  various  (equivalent)  definitions  of  a  solvable  group.  Perhaps  the  most  familiar  is 
that  the  (simple)  factors  of  a  composition  series  should  be  as  ‘simple’  as  possible,  that  is,  abelian 
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and  hence  cyclic  of  prime  order  This  displays  the  sense  in  which  solvable  groups  are  close  to 
abelian.  How  large  is  this  class  of  finite  groups?  On  the  one  hand,  by  the  famous  Feit-Thompson 
response  (1963)  to  the  classical  Burnside  conjecture,  all  groups  of  odd  order  are  solvable.  On  the 
other  hand,  there  is  a  great  wealth  of  nonabelian  simple  groups  (18  infinite  classes,  beginning 
with  the  alternating  groups  A^,  n  ^  5,  and  26  additional  ‘sporadic’  groups  —  this  is  the 
Classification  Theorem  [40]),  and  so  there  is  a  correspondingly  large  number  of  nonsolvable 
groups  than  can,  in  principle,  be  constructed,  say  by  Schreier’s  approach  to  the  extension 
problem.  Of  course,  most  of  these  nonabelian  simple  groups  have  impractically  large  orders.  Thus 
the  smallest  orders  occurring  arc  60,  168,  360,  2520,  7920, .... 

Specific  examples  of  solvable  groups  of  even  order  include  those  whose  order  =  2'^p'’, 
w.  re  p  is  a  prime  and  m,  n  ^  0;  in  particular,  groups  of  order  2". 

A  slightly  more  restrictive  class  of  groups  is  especially  convenient  for  optimally  reducing  the 
group  transform  complexity,  for  a  g.ven  order.  This  is  the  class  of  nilpotent  groups.  Again, 
various  definitions  are  pissible.  We  give  one  motivated  by  subgroup  structure.  Suppose  that  G  is 
a  group  of  order  N  with  prime  factorization  N  =  Pj“'  . .  Pn“".  By  Sylow’s  theorems  there  is  a 

subgroup  of  order  p  for  each  i  =  1 . n,  and  for  each  such  i,  all  subgroups  of  order  pj“‘  are 

conjugate  and  hence  isomorphic.  These  subgroups  are  the  Sylow  subgroups  of  G.  Further,  each 
Sylow  subgroup  contains  a  normal  subgroup  of  all  possible  orders  Pi‘,  1  ^  t  <  aj.  If,  for  each  i, 
there  is  a  unique  Sylow  subgroup,  then  G  is  the  direct  product  of  fhese  subgroups,  and 
conversely.  Such  groups  ate  called  nilpotent. 

Additional  details  on  the  Sylow  theorems,  and  nilpotent  or  solvable  groups  are  available  in 
standard  souices;  for  example,  the  baoks  by  Hall  [41],  Hungerford  [42],  MacLane-Birkhoff  [43], 
etc. 


For  applicar.ions,  of  course,  we  need  some  specific  examples  of  nonabelian  groups, 
preferably,  as  we  have  Just  indicated,  those  being  nilpotent  or  at  least  solvable.  We  indicate  next 
a  few  liuch  examples.  Among  the  most  familiar  examples  of  nonabelian  groups  are  the  symmetric 
group  Sn  and  its  (normal)  subgroup  A^,  the  alternating  group.  We  earlier  noted  that  S^  is  only 
solvable  for  n  <  4.  The  groups  S^  are  interesting  because  of  Cayley’s  classical  result  that  any 
group  of  order  N  is  isomorphic  to  a  subgroup  of  (via  the  right  regular  representation),  while 
the  An  are  interesting  as  the  ‘simplest’  examples  of  nonabelian  simple  groups  (for  n  4). 

Keep’iig  with  our  theme  of  only  considering  groups  that  are  close-to-abelian,  we  consider 
next  semidirect  products  of  abelian  groups.  The  simplest  of  these  cases  arises  when  the  abelian 
groups  are  cyclic;  the  semidirect  product  of  two  cyclic  groups  is  called  a  metacyclic  group.  The 
most  familiar  examples  of  this  construction  arc  the  dihedral  groups  (the  symmetry  groups  of 
the  regu'^r  n-gons).  Given  the  cyclic  groups  C„,,  C^,  the  group  generated  by  two  elements  a.  b, 
satisfying  the  relations 

a"  =  b*"  =  e  ,  b  a  b‘*  =  a*^ 

with  k"'  =  1  (mod  n),  can  be  shown  [43,  p.  462]  to  be  a  semidirect  product  oi  and  C^.  The 
group  Dn  IS  the  special  case  where  k  -  n  -  1,  rn  =  2,  and  so  ord(Dn)  =  2n.  We  find  that  D2  is  the 
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(abelian)  Klein  4-group,  already  mentioned  in  Section  III. 2,  D3  is  the  nonabelian  group  of  least 
order,  namely  S3,  D4  is  the  so-called  octic  group,  etc.  It  turns  out  that  dihedral  groups  are  just 
those  subgroups  of  an  arbitrary  group  generated  by  a  pair  of  distinct  elements  ol  order  2 
(involutions),  and  that  all  subgroups  of  are  either  cyclic  or  dihedral. 

Since  any  extension  of  a  solvable  group  by  a  solvable  group  is  again  solvable  [43,  p.  475],  it 
follows  in  particular  that  metacyclic  groups  and,  more  generally,  semidirect  products  of  abelian 
groups  are  solvable. 

The  only  other  example  of  a  nonabelian  group  of  order  <  10  is  the  quaternion  group  of 
order  8.  This  is  the  first  nonabelian  case  of  another  class,  jQ^j  of  groups  of  order  4n  called 
dicyclic  groups.  Qp  is  defined  abstractly  as  generated  by  a,  b  satisfying 

a2n  =  a’’  b^  =  e  ,  ab  =  ba** 

The  quaternion  group  Q2  derives  its  name  from  its  interpretation  as  |±i,  ±j,  ±k,  ±2|  with  group 
structure  derived  from  the  corresponding  multiplication  in  the  four-dimensional  skew  field  of  all 
quaternions.  Q2  may  also  be  obtained  isomorphically  as  the  matrix  group  generated  by  the 
complex  matrices 


'  0  r 

’0  i' 

and 

-1  0 

i  0 

(If  we  replaced  i  by  1  in  the  second  matrix  above,  we  would  obtain  D4.) 

Here  we  might  make  a  quick  comment  intended  to  tie  together  some  of  this  material  with 
the  general  Heisenberg  group  concept  mentioned  at  the  end  of  Section  11.4.  Let  p  be  a  prime  and 
F  the  field  of  p  elements.  The  groups  whose  elements  are  of  d»c  form  given  in  Equation  (11.21), 
with  a,  b,  ceF,  is  of  order  p^.  It  turns  out  that  there  are  only  two  distinct  (nonisomorphic) 
groups  of  this  order  for  any  such  p,  and  this  construction  easily  defines  one  of  them.  In  the  case 
p  =  2,  this  group  is  just  D4.  Groups  of  this  special  nature  are  of  interest  in  harmonic  analysis 
because  they  furnish  the  simplest  examples  of  asymmetry  of  the  norms  of  convolution  operators 
on  the  associated  LP  spaces,  2  <  p  <«  [50]. 

This  completes  our  brief  resume  of  relevant  theory  and  e.xamples  of  nonabelian  groups.  In 
practice,  to  apply  the  group  transform  methodology  to  signal  coding  or  feature  extraction,  or  to 
the  group  fi  rs  of  the  next  section,  we  have  to  have  available  a  complete  list  of  irreducible 
unitary  rt'  mentations  for  the  underlying  group.  That  is,  given  the  group  G  (abelian  or  not),  we 
must  know  a  representation  foi  each  class  comprising  P,  1  <  i^  r.  Aid  in  this  endeavor  is 
provided  by  some  facts  niesented  in  Section  111.2,  namely  that  r  =  number  of  conjugacy  classes  of 
G,  and  that  the  dimensions  dj  ol  T,  divide  ord(G)  and  obey  the  constraint  d|-  +  .  .  .  +  d^-  =  ord(G). 
Then,  in  order  fo  take  advantage  of  possible  savings  in  computational  effort,  we  should  also  have 
available  a  composition  series  for  G.  Recall  that  this  is  a  subnormal  series  (a  descending  sequence 
IhJ  of  subgroups  with  G  =  H^  3  Hi  D  H^  D  .  .. .,  and  each  Hj  normal  in  H,.i)  of  maximal 
length,  and  that  tny  two  such  series  are  equivalent  by  the  Jordan-Holder  theorem.  Then  the 
g-'oup  transform  can  be  efficiently  computed,  as  we  have  shown,  by  nesting  down  the  subgroups 

iH,l. 
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III.5  GROUP  FILTERS 


This  last  section  is  somewhat  more  speculative  than  the  others  in  this  chapter  and  is,  in  fact, 
a  major  reason  they  were  written.  Our  purpose  is  to  define  and  briefly  discuss  group  filters,  and 
to  examine  one  of  their  possible  roles  in  data  processing,  namely,  that  of  suboptimal  Wiener 
filters.  Unlike  most  of  the  foregoing  material  there  is  not  a  great  deal  of  precedent  for  the  use  of 
general  group  filters  in  this  fashion;  we  can  cite  only  the  recent  note  by  Trachtenberg  [44]  which 
was,  in  turn,  based  on  earlier  Joint  work  with  Karpovsky  [45].  That  work  was  concerned  with 
various  deterministic  problems  centered  around  the  use  of  group  filters  to  approximate  more 
generally  defined  systems,  such  as  multivariate  time-invariant  linear  systems  defined  over  a  finite 
discrete  time  interval.  Once  a  group  G  has  been  chosen,  this  problem  reduces  to  the 
approximation  of  an  operator  on  L^fG),  derived  from  the  impulse  response  matrix  of  the  given 
system,  by  group  filters  on  G,  in  the  Hilbert-Schmidt  operator  norm.  Of  course,  the  selection  of 
G  is  both  basic  and  difficult. 

The  definition  of  group  filters  is  motivated  by  the  description  of  transform  coding  in 
Section  III.l,  combined  with  the  use  cf  group  transforms.  Namely,  given  a  finite  group  G,  and  a 
data  vector  feL^fG),  we  perform  the  following  sequence  of  operations  on  f: 

f  -  f  =  Foff)  ^  Df  ~  FG-'(Df)  ,  (III.47) 

where  D  is  a  diagonal  operator  on  the  transform  domain.  That  is,  D  is  defined  on  L^fP)  by: 

D(A,  Ar)  =  (A,  D, . A,D,)  ,  (III.48) 

where  each  Dj  is  an  operator  on  the  dj-dimensional  space  V;  of  the  ith  irreducible  representation 
T„  in  the  notation  of  Equation  (III. 26).  Thus  the  effect  of  the  group  filter  is  to  selectively  weight 
the  spectral  components  T,(f),  i  =  1, .....  r,  of  the  data.  Extending  the  terminology  of  Pearl  [7],  a 
group  filter  is  a  linear  basis-restricted  transformation  whose  unitary  component  is  a  group 
transform. 

Formally,  then,  a  group  filter  is  an  operator  on  L^{G)  of  the  form 

Fg  •  D  •  Fg  , 

where  Fg  is  the  group  transform  and  D  has  the  form  displayed  in  Equation  (III. 48).  An 
operations  count  reveals  that  such  operators  are  generally  of  lower  computational  complexity 
than  arbitrary  operators  on  L^fG).  Namely,  if  G  is  chosen  so  that  Fg  has  a  fast  algorithm  in  the 
sense  of  Section  III. 4,  and  N  =  ord(G),  then  D  can  be  computed  with  d|2+  . .  +dr2  =  N 
multiplications,  and,  hopefully,  the  transform  and  iis  inverse  can  be  done  with  0(N  log  N) 
multiplications  each.  Therefore,  with  reasonable  choices  of  G,  wc  can  expect  the  multiplicative 
complexity  of  a  group  filter  on  G  to  be  N[1  +  0(log  N)],  compared  with  N^  for  a  general 
operator. 

From  general  properties  of  group  transforms  we  see  immediately  that,  alternatively,  a  group 
filter  is  simply  a  right  convolution  operator  on  L2(G).  Such  an  operator  sends  f  into  f  *  d,  where 
T,(d)  =  Dj,  i  ^  i  ^  r.  (There  is,  of  course,  an  equivalent  tneory  of  group  filters  involving  left 
multiplication  in  Equation  (III  48)  and  left  convolution  operators). 
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Having  thus  defined  group  filters  in  both  the  ‘time  domain’  (convolution)  and  ‘frequency 
domain’  (multiplication),  we  can  now  state  their  basic  application:,  for  a  given  problem  of  discrete 
system  simulation  or  of  signal  estimation,  involving  N-dimensional  data,  choose,  for  a  specified 
group  G  of  order  N,  an  optimal  group  filter  on  G.  The  term  ‘optimal’  is  deliberately  a  little 
vague  here,  and  an  optimality  criterion  must  be  naturally  be  specified.  Oftentimes,  it  is  defined 
by  a  quadratic  functional  on  L[L2(G)].  Motivations  for  selecting  group  filters  as  approximating 
devices  include  speed  of  computation,  suitability  for  specific  architectures  (especially  filters  on 
dyadic  groups),  and  for  specific  signal  statistics;  see  also  [45], 

Before  getting  into  this  topic  in  greater  depth  we  want  to  offer  just  a  few  words  of 
additional  perspective  here.  Convolution  operators  on  the  standard  infinite  lea  groups  G,  such  as 
the  circle  group  or  are  familiar  and  powerful  tools  of  analysis.  Much  of  their  effectiveness  is 
based  on  the  concept  of  an  approximate  identity.  This  is  a  sequence  in  L^G)  with  the 

property  that  for  every  feL*(G), 

f  =  lim  f  ♦  6-  (=  lim  *  0 

in  the  metric  of  L*(G).  Such  sequences  can  be  constructed  on  any  of  the  standard  lea  groups  by 
taking  a  bounded  sequence  in  L>(G)  with  the  properties  that 

J*  dmG  =  1  , 

and 

to  J«„(lmo  =  0  , 

G/V 

for  each  neighborhood  V  of  e.  For  example,  any  sequence  of  probability  density  functions  whose 
supports  decrease  to  e  would  qualify,  as  would  any  sequence  on  defined  by 

3„(x)  =  nkd)(nx)  .  (IIM9) 

for  some  fixed  d>tLKR''),  Jd>(x)dx  =  Thus  the  Dirichlet  and  Fejer  kernels  are  classical  examples 
of  approximate  identities  in  Fourier  series;  the  corresponding  convolution  operators  are  the 
partial  Fourier  series  and  Cesaro  sum  operators.  In  a  different  direction,  the  Gauss  kernel 
0(x)  =  {l/\/27r)exp(-x2/2)  will  serve  in  Equation  (III.49);  the  corresponding  convolution  operators 
for  positive  real  n  solve  the  classical  h  at  equation  at  time  n.  Another  kind  of  convolution 
operator,  derived  from  the  Poisson  kernel,  solves  the  classical  Dirichlet  boundary  value  problem. 
So  convolution  operators  are  involved  in  many  branches  of  analysis  such  as  partial  differential 
equations,  potential  theory,  operator  semigroups,  Fourier  analysis,  etc.  The  point  is  that  these 
familiar  operators  and  approximation  techniques  are  not  part  of  our  subject  here.  When  the 
underlying  group  is  finite,  as  is  the  case  with  discrete  data  processing,  there  is  an  identity  for 
convolution;  approximate  identities  are  not  required.  Convergence  concepts  are  not  an  issue; 
rather,  it  is  the  selection  of  high  speed  linear  data  processors  for  specific  tasks,  in  a  certain 
statistical  environment. 
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Let’s  now  look  briefly  at  the  nature  of  group  filters,  both  individually  and  collectively,  and 
then  move  on  to  their  applications  as  suboptimal  Wiener  filters.  We’ll  denote  the  set  of  all  group 
filters  on  a  given  group  G  by  4>(G),  and  an  element  of  4>(G)  with  kernel  p  by  T^: 

Tp(f)  =  f*p,  f6L2(G) 

Each  operator  has  its  adjoint  (Tp)*  =  T^*.  For  both  the  operator  norm  ||Tp||  and  the  Hilbert- 
Schmidt  norm  ||Tp||2,  we  have 

llp||^l|Tp||^||Tp||2<||p||  , 

and  so  both  norms  of  the  group  filter  Tp  are  equal  to  the  norm  of  p  as  an  element  of  L^fG).  A 
group  filter  Tp  is  a  normal  operator  whenever  G  is  abelian  or,  more  generally,  whenever  p  is 
normal:  p  *  p*  =  p*  *  p. 

The  set  4>(G)  of  all  group  filters  on  G  is  clearly  a  unital  subalgebra  of  the  entire  operator 
algebra  on  L^fG).  Indeed,  it  is  the  commutant  of  the  set  of  left  convolution  operators,  and 
conversely.  (This  relationship  is,  suitably  interpreted,  valid  for  general  unimodular  locally 
compact  groups;  see  [46],  and  Reference  [5]  of  Chapter  II). 

Next  we  show  that  the  group  filter  algebra  ^>(G)  is  N-dimensional,  and  exhibit  a  frame-basis 
for  it.  First,  from  the  definition  of  convolution, 

V(g)  =  S  ‘'h  R(h)  f(g)  .  (III.50) 

hcG 

where  R(0  is  the  right  regular  representation  of  G  on  L2(G),  and  C(,  =  p(h->)/N.  This,  and  the  fact 
that  each  R(h)  is  of  the  form  Tp,  where  p  =  Ne^^.j,  shows  that  4>(G)  =  span  |R(h):  htGj.  But,  also 

(N,  h  =  e 
tr[R(h)]  =  | 

( O,  otherwise 

and,  therefore,  since 

<R(g),  R(h)>  =  tr[R(g)  R(h)*] 

=  tr[R(gh-l)]  , 


the  operators  |n-I/2  R(g):  geG|  constitute  a  frame  in  4>(G). 

(Another  way  to  obtain  the  structure  of  ❖(G)  is  to  simply  note  that  the  mapping  p  —  Tp  is 
a  1  anti-isomorphism  from  L2(G)  onto  4»(G).  It  is  norm-preserving  and  maps  the  frame  elements 
V  Neg,  g€G,  of  L2(G)  onto  the  elements  N'*/2R(g-i)  of  ‘I>(G),  so  these  must  constitute  a  frame 
there.) 

It  follows  that  the  problem  of  system  simulation,  wnich  reduces  to  the  approximation  of  an 
operator  on  L2(G)  by  a  group  filter,  is  easily  solved  by  an  orthogonal  projection  of  that  operator 
on  (G).  In  particular,  it  might  be  tempting  to  select  a  group  filter  for  Wiener  filtering  by  simplv 
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projecting  the  Wiener  operator  W  of  Equation  (II.3)  onto  ^(G).  However,  there  is  no  reason  to 
believe  that  this  projection  will  minimize  the  mean  square  error  over  all  group  filters. 

To  obtain  the  best  group  filter  for  Wiener  filtering  we  have  two  distinct,  but  ultimately 
equivalent,  methods.  In  either  case  we  are  presented  with  a  data  vector  yeL^fG),  of  the  form 

y  =  s  +  i7 

as  per  Equation  (II.  1),  with  known  covariance  operators  Cg,  on  L^fG).  .As  usual,  the  signal  s 
and  the  noise  17  are  assumed  uncorrelated.  In  the  first  method,  we  let  T  be  an  operator  on  L^fG). 
Then,  for  fixed  s, 

e(s;T)  =E(||s-Ty||2) 

=  (|s-Ts||2  +  tr(TC„T*) 

Next,  averaging  over  the  signal  prior  distribution,  we  obtain  the  mean  square  error  for  each 
possible  estimation  operator  T  as 

e(T)  =  tr[(I  -  T)Cs  (I  -  T)^  +  TC„T*] 

=  tr[T(Cs  +  C,,)T*]  -  2tr(TCs)  +  tr(Cs) 

=  T(C,  +  C„),'r>  -  2<T,  Cs>  +  c, 

a  quadratic  functional  of  T. 

Assuming  the  operator  Cj  +  to  be  invertible  (and,  therefore,  positive  definite),  the 
unconstrained  minimum  of  e(0  over  all  T  occurs  at  the  Wiener  filter  W  =  Cs(Cs  +  as  noted 
in  Equation  (II.3).  Its  mean  square  error  was  given  in  Equation  (III.2).  By  contrast,  the  minimum 
of  e(’)  over  the  subspace  <I>(G)  will  be  assumed  at  that  group  filter  T^  for  which 

Tp(Q  +  C„)  -  Cs  1  «I>(G)  .  (III.51) 

If  we  express  T^  as  in  Equation  (III.50),  the  optimality  condition  (III. 51)  yields  a  system  of  linear 
equations  for  the  coefficients  jcj,:  heGj: 

X  ^gh  =  bg  ,  g<G  ,  (III.52) 

hcG 

where 

bg  =  <C„  R(g)> 
and 

agh  =  <Cs  +  C„,  R(lr‘g)> 

Thus,  the  matrix  coefficients  |ag},|  of  this  system  have  the  Toeplitz-like  form  Ugi,  "  0(h'*g),  where 
<t)  is  easily  seen  to  be  a  positive^efinite  function  on  G;  the  matrix  [agj,]  is  therefore  positive 
semidefinite.  In  fact,  this  matrix  is  positive  definite,  as  long  as  the  operatoi  Cj  +  is  invertible 
which  we  are  assuming), 
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Let  be  the  group  filter  whose  coefficients  in  its  expression  (III. 50)  are  the  unioue  solution 
to  the  system  (III.52).  is  then  our  suboptimal  Wiener  filter.  The  impulse  response  function  w 
is  explicitly  given  by  w(g-*)  =  NCg,  geG.  What  is  the  additional  mean  square  error  incurred  by  the 
use  of  the  filter  in  place  of  the  Wiener  operator  W?  The  answer  is  that 

e(T  J  =  e(W)  +  <W  -  T^,  ;  (111-53) 

tha  t  is,  the  additional  error  is  essentially  the  component  of  the  difference  operator  along  the 
signal  ovariance  operator. 

Various  games  can  now  be  played  with  the  right  hand  side  of  the  resulting  estimate 

0<e(T^)-e(W)^  II W  -  TJjllCsIb  -  (ni.54) 

depending  on  what  is  assumed  about  the  signal  and  noise  statistics.  For  example,  in  terms  of  the 
normal  power  spectrum  jX], . . .,  of  the  signal  s  (equivalently,  the  spectrum  of  the  covariance 
operator  Cj),  we  have  the  bound 

l|CJl2  =  (X2i+...  +  x2^)'/2 

^X,  +  ...-^Xn  =  E(||s||2)  , 

from  Equation  (III.9).  If  the  signal  is  Gaussian  then  we  can  derive  a  sharp  expression  for  ||Cs||2 
by  diagonalizing  Q  and  expanding  s  in  the  eigenvector  frame  juj, . . . ,  un|: 

S  =  CiUi  CnUn 

I|S||2  =  |c,12  +  .....h|cN|2  , 

l|sr=  21Ci|2|Cj:2  , 

id 

E(||s||4)=  2E(|ci|2)E(lcj|2)  , 

•d 

N 

=  SE(|cr)+  SE(l‘=il^)E(|cj|2) 
i=  1  i^j 

=  3|;E(|Ci|2)U2;E(|cJ2)E(|Cj|2) 
i=  1  i?^j 

=  2  ^E(|ci|2)2  +  [^E(|0i|2)]2  , 

i=l  i=l 

and,  finally, 

IIC3II2  =  a2  +  .  .  ,  +  ^2^  ,  1  (e{||s||4)  .  E(||s||2)2) 
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The  other  term  on  the  right  side  of  Equation  (III.54)  can  be  bounded  by  geometrical 
arguments:  let  |cg|  be  the  coordinates  of  in  and  let  dg  =  <W,  R(g)  >  /\/N,  geG,  be 
the  coordinates  of  the  projection  of  W  on  4>(G).  Then  one  can  check  that 

I|W-TJ2=||W|12+||w1|2-2  ^  •'eMg)  , 

geG 

from  two  applications  of  the  Pythagorean  formula. 

This  completes  our  discussion  of  the  first  method  for  obtaining  the  optimum  group  filter  T^ 
for  Wiener  filtering.  We  may  refer  to  it  as  the  ‘direct  method,’  as  it  operates  directly  on  the 
sample  space  L^fG)  and  associated  operators.  However,  it  is  unsatisfying  in  that  no  formula  for 
the  filter  T^  is  obtained.  We  might  indeed  say  that  the  solution  is  only  indirectly  presented  via 
the  Equations  (III.52).  A  desire  to  remedy  this  difficulty,  together  with  prior  experience  with 
deconvolution  problems,  leads  us  to  the  second  ‘indirect  method.’  In  this  approach  we  attempt  to 
identify  the  optimum  response  function  weL2(G)  rather  than  the  operator  T^^,,  and  we  do  so  by 
posing  the  problem  as 

w  =  arg  min  E(l|y  *  p  -  s||2) 

peL2(G) 

Letting  now  e(p)  denote  the  expectation  on  the  right  hand  side  above,  averaged  over  first  the 
noise  and  then  the  signal  distributions,  we  have,  after  a  Fourier  transformation  and  some 
algebra, 

e(p)  =  E(||yp-s||2) 

=  tr[p  •  (Cf  +  C^)p  -  2Csp  +  C§)] 

a  quadratic  function  of  p  e  L^fF).  Its  unconstrained  minimum  occurs  at 

w  =  q(Q  +  C5j)->  .  (III.55) 

a  formula  which  is  essenti'.lly  given,  without  proof,  in  [44].  (In  fact,  we  have  chosen  the  notation 
Cj,  Cjj  to  agree,  as  much  as  possible,  with  that  of  [44].  We  note  that,  for  instance,  is  that 
element  of  L^fF)  whose  ith  value  is  the  operator 

N-2  ^  E[s(g)s(h)]  T,(h->g) 
g.h 

so  that  if  P  is  the  operator  of  right  multiplication  by  A  on  L2(F),  we  have 
E(||P(s)I12)  =  <C5  A,  A>  . 

Hence  the  optimal  w  in  Equation  (III.55)  is  computed  as  a  product  in  the  H*-algebra  L2(F). 

Thus,  the  indirect  method  gives  the  explicit  formula  (HI. 55)  for  the  transform  of  the  optimal 
response  function  w,  in  terms  of  the  signal  and  noise  covariance  components.  Since  the  major 
point  of  using  group  filters  as  suboptimal  Wiener  filters  is  their  reduced  computational 
complexity,  obtained  by  group  transforming  the  data  vector,  multiplying  by  w,  and  inverse 
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cransforming,  this  method  is  to  be  preferred  to  the  direct  method  because  it  makes  w  immediately 
available. 

In  analogy  with  Equation  (III.2)  one  can  verify  that  the  minimal  mean  square  error  is  given 
by 


e(w)  =  X  di  tr[Cs  -  Q  (q  +  Cjj)-lC5)(i)] 

i=l 

=  iditrj[e(i)-w(i)]-q(i)j  , 
i=l 

with  e  the  identify  in  L^fF).  Unfortunately,  this  does  not  seem  as  useful  in  assessing  the  increase 
in  error  over  the  Wiener  filter  as  is  the  expression  of  Equation  (III.53)  and  the  subsequent 
estimate  (III.54). 
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IV.  CONCLUSIONS  AND  OUTLOOK 


We  will  now  summarize  the  foregoing  material  and  emphasize  some  key  points;  along  the 
way  we  will  suggest  a  few  promising  directions  of  further  research. 

Chapter  I  represents  an  attempt  to  offer  some  general  mathematical  perspective  on  a  very 
large  and  disparate  field.  It  is  deliberately  presented  at  a  low  technical  level,  so  as  to  be  widely 
accessible.  There  is  also  an  attempt  to  be  just  a  bit  provocative  by  designating  a  few  results  as 
being  most  fundamental  from  a  mathematical  viewpoint. 

In  a  more  serious  vein  we  proposed  a  triangular  array  of  mathematical  areas  as  foundational 
for  signal  processing,  namely,  probability/ statistics,  Hilbert  spaces/ operator  theory,  and  group 
representations/ harmonic  analysis.  The  value  and  interplay  of  the  first  two  areas  is  by  now 
familiar  and  well  developed,  and  is  not  discussed  herein  in  much  detail.  Let  us  just  stress  once 
again  that  such  analysis  begins  with  Equation  (11-4),  which  we  feel  it  fair  to  designate  as  the 
‘Fundamental  Equation  of  Signal  Processing’.  According  as  the  unknown  signal  x  there  is 
deemed  to  be  deterministic  or  random,  and  based  on  the  nature  of  the  constraints  and  other 
prior  information  available  concerning  x,  a  variety  of  filters  can  be  devised  to  optimize  a 
particular  performance  measure.  The  Wiener  and  Gauss-Markov  filters  of  Section  III.l  are 
standard  examples  for  recovering  a  random  and  a  deterministic  signal,  respectively.  In  a  different 
direction,  the  method  of  projection  on  convex  sets  (POCS)  has  become  popular  over  the  last  few 
years.  Here,  all  information  about  an  unknown  deterministic  signal  x  (data  constraints)  is 
combined  to  locate  x  in  the  intersection  of  a  family  of  convex  sets,  and  iterations  involving  the 
(generally  nonlinear)  urojections  on  these  sets  are  constructed  to  yield  sequences  which  converge 
at  least  weakly,  to  the  unknown  signal  [1].  The  inherent  nonuniqueness  of  these  methods  may  be 
controlled  by  introd  acting  a  further  cost  functional  [2]. 

The  essential  point  here  is  that  all  these  Hilbert  space-centered  methods  have  not  been  our 
major  concern.  We  have  rather  chosen  to  study  the  role  of  our  third  foundational  area:  the 
group-related  analysis.  In  doing  so  we  eventually  discerned  three  classes  of  application  which 
could  be  indexed  by  the  kind  of  group  involved.  Thus: 


Group  Type 
Finite 


Infinite  lea 
Heisenberg 


Application 

Digital  signal  processing  (transform 
coding,  pattern  recognition,  fast 
suboptimal  filters) 

Weakly  stationary  and  hanr.onizable 
signal  models,  filters,  sampling 
Characterization  of  lea  group 
transform;  connnection  with 
uncertainty  principle  and  ambiguity 
function 
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The  applications  involving  infinite  groups  were  surveyed  rather  quickly  in  Sections  II.2-II.5. 

Those  involving  the  lea  groups  seem  to  have  reached  a  mature  stage,  with  work  remaining  at  a 
fairly  abstract  level.  It  is  worth  emphasizing  again  that  the  group  theoretic  approach  provides  a 
systematic  and  unified  approach  to  a  frequency  domain  theory  for  weakly  stationary  processes, 
along  with  the  associated  invariant  filters.  By  contrast,  those  applications  involving  the  real 
Heisenberg  groups  and  its  representations  are  of  quite  recent  development,  are  at  a  higher  level 
of  mathematical  complexity,  and  are  of  somewhat  more  uncertain  value.  The  connections  with 
radar  theory  seem  particularly  worthy  of  further  research  efforts. 

Finally,  half  of  this  report  nas  been  devoted  to  what  appears  to  be  the  most  promising  area 
of  immediate  applications  to  the  practice  of  digital  data  processing.  The  essential  idea  is  the 
systematic  use  of  those  finite  dimensional  unitary  transforms  which  can  be  realized  as  group 
transforms  of  some  finite  group.  We  noted  in  Section  III.4  that,  as  a  consequence  of  Kellogg's 
theorem,  only  nonabelian  group  transforms  can  be  expected  to  significantly  improve  on  the 
ordinary  DFT  in  terms  of  error  reduction  in  signal  compression  or  filtering,  although  there  may, 
for  some  purposes,  be  computational  advantages  which  devolve  from  the  group  transform  on 
some  noncyclic  abelian  group  (e.g.,  the  Walsh-Hadamard  transform  on  a  dyadic  group). 

We  can  suggest  some  fairly  natural  research  questions  connected  with  this  material  and, 
indeed,  Chapter  III  should  be  viewed  as  the  necessary  background  and  motivation  for  these 
questions,  at  the  most  elementary  level.  They  all  center  around  the  association  between  a  given 
covariance  matrix  (representing  the  second-order  signal  statistics)  and  the  optimal  group,  of 
appropriate  order,  for  a  particular  signal  processing  task.  Having  fixed  such  a  task,  such  as 
signal  decorrelation  or  Wiener  filtering  with  additive  white  noise  at  a  specified  SNR,  it  is 
possible,  in  principle,  to  partition  the  cone  of  N  x  N  positive  semidefinitc  matrices  into  a  finite 
number  of  subsets  indexed  by  the  appropriate  optimal  group.  The  number  of  subsets  in  this 
partition  would  equal  1  +  the  number  of  nonabelian  groups  of  order  N.  The  cone  of  matrices 
might  be  further  reduced  by  imposing  a  bound  on  their  norm  (=  spectral  radius)  or  by  requiring 
a  constant  diagonal  (equal  variances).  For  a  fixed  block  length  N,  and  signal  processing  task,  this 
is  itself  a  kind  of  pattern  classification  problem  with  the  groups  being  the  ‘patterns’. 

Su;  »e,  in  order  to  be  specific,  we  consider  the  task  of  suboptimal  Wiener  filtering  via 
group  filters,  as  in  Section  III.5.;  Starting  with  the  formula  for  Wiener  filter  W  in  Equation  (11,3), 
and  the  error  formulas  (III.2),  (III.53),  we  can  readily  derive  the  simple  error  relation 

e  (T  J  =  <  I  -  1  C,>, 

for  the  optimal  group  filter  T^.  This  quantity  can  now  be  used  as  a  performance  measure  to 
select  the  corresponding  optimal  group  for  each  given  signal  covariance  Cj.  A  few  such  studies 
for  various  data  lengths  N  should  reveal  the  potential  of  nonabelian  groups  and  their  associated 
transforms  and  filters  to  replace  conventional  methods  based  on  the  use  of  cyclic  groups  and  the 
DFT. 

The  rather  meagre  evidence  available  to  data,  especially  the  computer  experiment  reported 
by  Trachtenberg  (Reference  [44]  of  Chapter  III)  for  the  case  where  the  signal  s  is  derived  from  a 
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first  order  Markov  process,  suggests  that  filters  over  nonabelian  groups  can  indeed  lower  the 
mean  square  error  of  Wiener  filtering  by  several  percent  over  that  determined  by  the  DFT. 

We  might  also  emphasize  here  that  group  filters  can  serve  to  approximate  any  filter,  not  just 
the  Wiener  filter  W.  We  have  already  noted  that  most  filters  derived  by  optimizing  some  Hilbert 
space  performance  criterion  (Wiener,  Gauss-Markov,  maximum  likelihood,  projection  filter,  min- 
max,  pseudoinverse,  etc.)  tend  to  be  computationally  intensive.  However,  such  filters  usually  have 
finite  dimensional  domain,  as  the  measurement  operator  A  in  Equation  (II.4)  is  of  finite  rank 
when  there  is  a  finite  number  of  observations,  and  so  there  is  the  general  possibility  of 
suboptimal  approximations  to  each  of  these  by  a  group  filter.  In  the  course  of  such  investigations 
we  might  also  expect  to  clarify  the  relative  efficiencies  of  the  direct  and  indirect  methods  of 
Section  III.5  for  actually  obtaining  group  filters  that  optimize  a  particular  performance  measure. 

In  summary, we  have  presented  an  overview  of  existing  and  likely  applications  of  ^roup 
theory  to  various  problems  of  signal  processing  and  modeling.  This  effort  is  to  be  viewed  as 
another  of  long  ongoing  series  of  group  theory  applications  to  various  scientific  and  technical 
fields.  The  role  of  group  theory  in  physics,  chemistry,  crystallography,  etc.,  is,  of  course,  one  of 
long  and  honorable  standing,  tracing  back  to  the  seminal  work  of  Weyl  and  Wigner.  More 
recently  we  can  see  the  infiltration  of  groups  into  statistical  research  [3,  4,  5];  this  is  in  addition 
to  the  well-developed  group  role  in  time  series  models  discussed  in  Chapter  II  above. 

There  is  also  a  large  body  of  material  in  the  engineering  literature  that  centers  around  the 
use  of  finite  (Galois)  fields.  For  present  purposes  we  want  to  point  out  two  just  areas  that  are 
particularly  related  to  the  general  theme  of  this  report:  number  theoretic  transforms  and  algebraic 
coding  theory,  especially  group  codes.  The  former  are  essentially  group  transforms  defined  on 
cyclic  subgroups  of  the  multiplicative  group  of  a  finite  field,  say  OF(q).  Naturally  the  lengths  of 
such  transforms  are  not  arbitrary  for  a  given  integer  q  (necessarily  a  prime  power),  but  are 
restricted  to  the  divisors  of  q  -  1.  Such  transforms  can  be  used,  together  with  their  associated 
fast  algorithms,  to  cyclically  convolve  integer  sequences  without  round-off  or  overfle  w  problems, 
and  thus  offer  another  approach  to  the  fast  FIR  filtering  and  correlation  of  general  real  or 
complex  data,  after  temporary  rescaling.  The  recent  survey  by  Blahut  [6]  provides  a  nice 
exposition  of  these  ideas,  and  also  discusses  some  issues  of  coding  theory,  such  as  the  use  of 
number  theoretic  transforms  in  a  given  field,  finite  or  not,  to  define  Reed-Solomon  codes 
(‘frequency  domain  coding”). 

The  concept  of  group  code  was  introduced  by  Slepian  in  1956,  and  studied  in  a  series  of 
papers,  of  which  we  just  cite  [7,  8].  Originally,  only  binary  channels  were  considered,  and  so  a 
group  code  was  defined  as  a  subgroup  of  a  dyadic  group  (in  our  terminology).  With  the 
geometric  view  that  elements  of  the  dyadic  group  of  order  2"  correspond  to  vertices  of  the  unit 
cube  in  real  n-space,  one  could  associate  with  a  group  code  a  (finite)  subgroup  of  the 
n-dimensional  orthogonal  group  which  acts  on  the  group  code.  The  group  code  is  an  alphabet 
for  describing  the  input  to  and  output  from  the  channel.  The  basic  problem  is  to  optimally 
encode  data  by  means  of  the  alphabet  so  as  to  minimize  a  mean  square  error  criterion.  This 
error  will,  of  course,  depend  on  the  prior  distribution  over  the  data  (this  is  often  assumed  to  be 
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uniform),  and  on  the  channel  transition  probabilities.  Solutions  to  this  problem  (e.g.,  [9]  and  its 
references)  utilize  the  dual  group  and  group  transform.  This  circle  of  problems  has  bee.i  extended 
to  more  general  group  codes  [subspaces  of  arbitrary  finite  fields  GF(q)]  and,  more  recently,  to 
the  possible  use  of  nonabelian  group  codes  which,  in  some  cases  are  already  known  to  yield 
better  performance  than  abelian  group  codes.  We  see  here  a  strong  parallel  with  the  situations 
discussed  in  Section  III.5  above,  where  nonabelian  group  filters  show  promise  of  outperforming 
the  more  conventional  ones  based  on  the  DFT. 

In  conclusion,  we  have  indicated  many  and  varied  applications  of  group  theory  and  the 
associated  harmonic  analysis,  at  various  levels  of  mathematical  sophistication,  to  assorted 
engineering  problems,  primarily  of  a  signal  processing  nature.  We  predict  that  group  theory  will 
e'emually  assume  as  fundamental  a  role  here  as,  say,  algebraic/ differential  geometry  already  has 
'.n  the  companion  field  of  control  theory. 

Let  us  close  by  recalling  the  opinion  of  E.  T.  Bell,  the  well-known  chronicler  of 
mathematical  history: 

“Wherever  groups  disclosed  themselves 
Or  could  be  introduced. 

Simplicity  and  harmony 

Crystallized  out  of  comparative  chaos.  .  .  .” 
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